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THE EFFECT OF THE POINT SPREAD FUNCTION ON THE GEOSTATISTICAL 

DOWNSCALING OF CONTINUA 

Q. Wang \ P.M. Atkinson 2,3,4,s 
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Ireland, UK 
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Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 

100101, China 


The point spread function (PSF) effect exists ubiquitously in remotely sensed images and 
introduces great uncertainty in the change-of-support problem (COSP) of image downscaling. We 
investigate the PSF effect on downscaling continua (e.g., pixel radiance, reflectance, brightness, 
etc.). Two general cases are considered: downscaling a single image and multi-resolution image 
fusion. In the former case, the input coarse image is the only available image for downscaling, 
while in the latter case, finer spatial resolution imagery observed in different wavelengths is 
available as auxiliary data for downscaling the coarse image. 

The geostatistics-based area-to-point kriging (ATPK) method can account for the PSF effect and 
is proposed for downscaling a single image. A WorldView-2 image was used in an experiment. It 
contains eight multispectral bands with a spatial resolution of 2 m. The image was degraded with 
a PSF filter and a zoom factor of S. The Gaussian filter with a width of half pixel size was used. 
Two zoom factors were considered, 2 and 4. The task of downscaling is to restore the 2 m image 
from the input 4 m (or 8 m) coarse image. Fig. 1 shows the downscaling results for the 4 m coarse 
image. The ATPK method with ideal square wave filter (i.e., no PSF) and Gaussian PSF (width is 
0.5 coarse pixel size) produces the results in Fig. 1 (c) and Fig. 1(d). It is clear that by considering 
the PSF effect, the downscaling prediction is more accurate. For example, the buildings in Fig. 
1(d) (see the marked yellow area) are clearer than those in Fig. 1(c), and closer to the reference in 
Fig. 1(b). Table 1 shows the quantitative accuracy assessment (based on Correlation coefficient 
(CC), universal image quality index (UIQI), and relative global-dimensional synthesis error 
(ERGAS)) for the two coarse images. The ATPK with PSF method can produce more accurate 
results. 
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Fig. 1 . Downscaling result for the 4 m coarse WorldView-2 image, (a) 4 m coarse image, (b) 2 m 
reference, (c) ATPK without PSF. (d) ATPK with Gaussian PSF. 


Table 1 . Quantitative assessment for the downscaling results for the WorldView -2 image 




CC 

ERGAS 

UIQI 

Ideal 

1 

0 

1 

S=2 

ATPK (no PSF) 

0.9436 

3.6724 

0.9334 

ATPK (PSF) 

0.9585 

3.0665 

0.9579 

S=4 

ATPK (no PSF) 

0.8435 

2.9440 

0.8097 

ATPK (PSF) 

0.8606 

2.7502 

0.8501 
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For multi-resolution image fusion, the recently developed area-to-point regression kriging 
(ATPRK) approach is proposed. A case study involving fusing 20 m and 10 m Sentinel-2 images 
covering an urban area was performed. The Sentinel-2 image was acquired on 18 August 2015 and 
covers an 8 km by 8 km area in Verona, Italy. The six 20 m bands were downscaled to 10 m bands 
by fusing with the four observed 10 m bands. For visual comparison, the results produced by 
ATPRK without PSF and with PSF are exhibited in Fig. 2(b) and Fig. 2(c), respectively. Obviously, 
More details of the urban fabric (e.g., roads and buildings, see the marked yellow area) were 
preserved when the PSF was considered, confirming the efficacy of the proposed approach. 




Fig. 2. Downscaling result for the Sentinel-2 image (bands 12, 8a and 5 as RGB), (a) 20 m coarse 
image, (b) ATPRK without PSF. (c) ATPRK with PSF. 
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GEOSTATISTICAL SIMULATIONS OF THE FULL 3D BLOCK HYDRAULIC 
CONDUCTIVITY TENSOR CONSIDERING INNER LOCAL-SCALE VARIABILITY 

N. Benoit 1,2 , D, Marcotte 1 , J.W. Molson 3 , P. Pasquier 2 

Geological Survey of Canada/Natural Resources Canada, 490 rue de la Couronne, Quebec, 

Canada, nicolas.benoit@canada.ca 

2 Ecole Polytechnique de Montreal, 2900 Boul. Edouard-Montpetit, Montreal, Canada 
3 Universite Laval, 1065 avenue de la Medecine, Quebec, Canada 


Introduction 

Heterogeneities in hydraulic conductivity (K) have a major impact on groundwater flow and 
contaminant transport. Local field measurements, however, typically focus on only one or at most 
a few local hydrofacies of hydrostratigraphic units (HSUs). This is insufficient to characterize 
regional hydrostratigraphic systems where numerous HSUs exist, each showing a range of Re¬ 
values spanning several orders of magnitude. For numerical modelling at the regional scale, the K 
field is usually discretized in blocks. Quasi-point local conductivities must therefore be upscaled 
to equivalent block conductivity tensors accounting for the structural connectivity of hydrofacies 
at the regional scale and textural conductivity correlations at the local scale (Gomez-Hemandez 
and Joumel 1990). 

In this study, a multi-step upscaling method was applied to define the spatial distribution of the 
full 3D hydraulic block conductivity tensor considering the effect of local-scale variability. The 
approach comprises four main steps: (i) local-scale simulation of K for each HSU, (ii) upscaling 
of local (point) realizations into the full 3D block K-tensors using a block by block finite element 
flow simulator, (iii) definition of the spatial covariance and cross-covariance of the block K-tensor, 
and (iv) direct geostatistical simulation of block K-tensors at the regional scale. The approach was 
tested for regional groundwater flow in a complex hydrostratigraphic system of the Innisfil Creek 
watershed, Ontario, Canada (Fig 1) (Benoit et al., 2017). We stress that K-tensor variability effects 
are studied only within each HSU, as the same deterministic HSU model is used for all realizations. 
Including uncertainty in the hydrostratigraphic model itself is also possible (Benoit et al. 2017). 
However, it was our objective to isolate only the effect of K-tensor variability within the HSUs. 
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Canada 


Study area * 


East 


AJp 
* North 




Fig. 1. 3D hydrostratigraphic model of the Innisfil Creek watershed (location of the regional cross¬ 


section indicated with arrows) and boxplots of K from grain size analyses (* symbols are 


laboratory measurements ofK). 


Data and methods 


The regional geological model of the Innisfil Creek watershed was provided by the Ontario 
Geological Survey (Bajc, personal communication 2016). This model includes 14 HSUs of 
unconsolidated sediments, discretized into 200 x 200 x 1 m block HSUs. Each block is assumed 
to be part of only a single HSU. Each HSU includes different hydrofacies described by a specific 
distribution of local K values (Fig. 1). The K database was built using 1,086 transmissivity 
measurements extracted from public wells, 32 soil samples recovered from boreholes analysed in 
the laboratory with constant and falling head permeability tests (Pasquier et al. 2016) and 1,694 
grain size analyses from soil sampling in 15 boreholes located in the study area (Bajc et al. 2015). 
These field data were used for local scale assessments of K, which produced a highly variable K 
distribution both between and within the HSUs (Fig. 1). From the K boxplots, six HSUs are 
characteristic of aquitards, six of aquifers, and two appear transitional. 

Herein, the collected grain size samples provide the only high-resolution information. Different 
methods for assessment of K values exist in the literature based on grain size analysis. Among the 
several tested methods, Sauerbrei’s method (Vukovic and Soro, 1992) was selected as it allows 
enough flexibility to cover the wide range of laboratory K measurements (see * symbols in 
boxplots in Fig. 1). 

The collected grain size samples have high resolution in the vertical directions along the boreholes. 
Vertical variograms of ln(K) were modelled separately for each of the 14 HSUs. These variograms 
were modelled either by spherical (12 HSUs) or exponential (2 HSUs) structures with a nugget 
effect (9 HSUs) representing 1-49% of the total variance. The practical vertical ranges vary 
between 1.1 and 19.5 m. Due to the large distances between the boreholes, the horizontal 
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variograms could not be defined based on the grain size analyses. Instead, a global horizontal range 
of about 1,800 m was estimated from K values based on specific capacity tests with relatively 
dense spatial coverage, which represents a horizontal/vertical range anisotropy ratio of about 300. 
This ratio was applied to each HSU, with the anisotropy ratio of 300 being comparable to the 
horizontal/vertical block length ratio in the model. 

Scalar K fields were simulated using a non-conditional turning band simulation (TBS) algorithm 
(Emery and Lantuejoul 2006) over a grid of 120 x 120 x 120 points on a mesh of size 2 m x 2 m x 
0.01 m. This simulation grid was divided into 12 x 12 x 12 blocks. 3D hydraulic conductivity 
tensors for each block were obtained by upscaling the local-scale simulation to the block scale 
(Fig. 2). Each individual block was upscaled with a block skin on each side. Thus, nine blocks 
were needed with the flow simulator to calculate the K tensor of the centre block. The method 
proposed by Zhou et al. (2010) was used with different sets of prescribed head conditions (three 
for flow between opposing faces, three for flow between opposing comers and two for flow 
between opposing edges). The Saltflow simulator (Molson and Frind, 2017) was used to obtain 
the 3D velocity and head fields. Mean 3D components of flux (q x ,q y ,q z ) and gradients (Vh x , Vh y , 
Vh z ) were calculated and used with Darcy’s law to solve the full 3D K tensor. 




In(K) 

—- 8.54 

1 - 70.3 

- 12.0 


Vertical exageration 200x 


ln(KJ 

—- 8.54 

1 - 70.3 

- 12.0 

m- ,3j 

■- 75.4 



•15 -14 -13 -12 -11 -10 -9 -8 


Fig. 2. Local-scale scalar K fields of unit AFB1 obtained from non-conditional TBS; the 
corresponding upscaled K xx tensorial component and bivariate transformation of the K zz 
component from a realization generated with LMC TBS. 

Analysis of the upscaled K-tensor components revealed a clear control of principal components 
with strong correlations between them. Off-diagonal tensor components were negligible for all 
HSUs. The K-tensors of the different HSUs showed correlation ranges of a few to several blocks 
with no nugget effect. The variogram behaviour at the origin appears parabolic due to the large 
change of support from quasi-point to block scale. Simple correlations (at h=0) between ln(K xx ) 
and ln(Kyy) components vary between 0.75<r<0.99. Correlations between horizontal components 
and ln(K zz ) were also strong (10 HSUs showed r>0.91). A linear model of coregionalisation (LMC) 
adequately describes the variograms and cross-variograms of ln(K xx ), ln(K yy ) and ln(K zz ) in each 
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HSU. Based on the upscaling results, two constraints were imposed: ln(Kyy)= ln(K xx ) and ln(K zz )< 
ln(K xx ). The latter constraint was applied a posteriori using the transformation of the ln(K zz ) 
realisation to the reference bi-variate distribution (ln(K xx ), ln(K zz )) of the upscaling results (Fig. 2, 
right). 

Application 

The developed block tensor coregionalisation model was used to assess regional 3D groundwater 
model of the Innisfil creek watershed (in development). For the application example, we present 
results based on a large 2D vertical groundwater flow model extracted from the watershed (Fig. 
3), chosen parallel to the flow direction based on the kriged head field. This regional cross-section 
includes five streams and cuts through the main physiographic features in the watershed: lowlands 
(north), uplands (south-central) and the highly permeable Oak Ridges Moraine (south). Although 
the model is based on real data, the assumption that the cross-section is aligned parallel to regional 
flow direction is a potential limitation. The study can thus be seen as a partially conceptual 
exercise. 

Groundwater flow was simulated using homogeneous (deterministic) and stochastic HSU K xx and 
K zz fields. The model boundary conditions are: no-flow conditions for the lateral boundaries 
corresponding to the watershed divide, a no-flow condition at the model base, seepage conditions 
to simulate streams, and imposed flux (recharge) conditions at the surface of the model. The 
recharge rates were adjusted based on expected values for surficial material, which values 
correspond respectively to 200 mm/y and 20 mm/y for coarse and fine soil deposits. Initial K xx and 
K zz values were corresponding to the upscaled geometric mean. First, the calibration was 
conducted on the deterministic model by adjusting K xx per HSU using the kriged regional zero- 
pressure surface as a calibration target. The K xx parameter was adjusted by applying numerical 
inversion with Pest algorithm (Doherty, 2005). The K zz values were derived from this updated K xx 
field using the K xx and K zz ratio from the initial values. Next, stochastic modelling was conducted 
varying the K xx and K zz fields and keeping the other settings unchanged. No further calibration 
was conducted in this modelling. However, for each realization, K xx field was modified based to 
the HSU ratio obtained for the calibration of the deterministic model (K xx _pEST/K xx _i n itiai). Bedrock 
was not included in the upscaling process. A constant K value of 10~ 5 m/s was assigned on its top 
decreasing linearly with depth to 10~ 10 m/s. 

Stochastic models with variable K tensors generated zero pressure surfaces comparable to that of 
the calibrated deterministic model (Fig 3). The impact of the K-tensor uncertainty on a typical flow 
simulation was assessed and proved non-negligible as shown with the simulations of the mean 
groundwater age. The stochastic simulations show that the mean age variation is indeed affected 
by the K tensors’ variation within the HSUs. 


7 
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340.5 


03 

LU 


87.5 



South 


North 


Kxx (m/s) 


1.0 e-3 


1.0 e-6 


Streams 


2.5 e-4 


2.5 e-7 


6.3 e-5 


6.3 e-8 


1.6 e-5 


1.6 e-8 


4.0 e-6 


4.0 e-9 


Distance (m) 


20,6000 



Mean age (years) 


12400 





Observed hydraulic head (m] 


Fig. 3. Top row: simulated K xx with stream function isolines via vorticity equation; Bottom row: 
HSU units and computed mean groundwater age for four realizations (left) along cut lines (yellow 
lines in top row) and computed vs observed hydraulic heads (right) for four realization and the 
deterministic model. 

The stochastic models were also used to calculate the capture zones of the simulated streams and 
their uncertainty using the backward-in-time transport equation (Comaton and Perrochet, 2006) 
(Fig. 4). 



Exit probability 


Fig. 4. Capture zones of the streams located in lowland area (North) for the deterministic model 
(left) and two realizations (centre and right). The capture zones were defined by the exit probability 
(0.1 colour intervals) of the random walk particles path toward the lowland streams. 

Fig. 4 shows that capture zones of the lowland are well defined with current information. Note that 
the current assessment probably underestimates uncertainty as the same deterministic HSU model 
was used for all realizations. Nevertheless, the effect of the alternative models define different 
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capture zones. Combining capture zones with mean groundwater age or its mean life expectancy, 
one can assess the vulnerability of each stream to a land contamination event and therefore 
determine possible land usage restrictions. 

Conclusion 

An efficient method was developed for regional characterisation of HSUs using 3D K tensors and 
was tested with 2D cross-sectional deterministic and stochastic groundwater flow simulations. It 
was found that within each HSU, K-tensor variability has a moderate effect on groundwater ages 
computed by the flow simulator but a relatively stronger effect on the definition of the stream 
capture zones. The results emphasize the different effects of homogeneous and heterogeneous 
HSU K-tensors under the same hydrogeologic settings and boundary conditions. These effects 
were quantified by simulating the groundwater age distribution and the capture zones of the 
streams. The proposed methodology appears well suited for determining the uncertainty of 
groundwater flow and transport models including aquifer vulnerability. 
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OPTIMISING SOIL SCREENING LEVELS TO DELIMIT POLLUTION RISK ZONES 

IN HIGH-DENSITY INDUSTRIAL AREAS 

C. Boente 1 , S. Gerassis 2 , M.T.D. Albuquerque 3 , A. Fernandez-Brana 1 , J. Taboada 2 , J.R. 

Gallego 1 

1 INDUROT and Environmental Technology, Biotechnology and Geochemistry Group, Campus 
de Mieres, Universidad de Oviedo, 33600, Mieres, Spain 
department of Natural Resources and Environmental Engineering, University of Vigo, Lagoas 

Marcosende, 36310 Vigo, Spain 

3 Instituto Politecnico de Castelo Branco, 6001-909 Castelo Branco, Portugal and 
CERENA/FEUP Research Center, Portugal 


Abstract 

Soil Screening Levels (SSLs) for contaminant elements are threshold values of reference, and they 
are required by many environmental studies and laws. SSLs are based on soil geochemical 
background data from sampling areas that are often extensive (regional SSLs). Regional SSLs are 
sometimes inappropriate for interpreting the true risk of pollution in small areas affected by 
specific factors such as geology, mining, industry, and traffic and that lie within a large region. 
The estimation of such a range of factors is unfeasible in large-scale samplings. This is precisely 
the context in which local SSLs, performed on a major scale closer to the area of interest, become 
a requirement for pollution risk assessment. To address this issue, here we performed a soil 
sampling campaign, comprising 150 samples, in the Langreo municipality (local zone), one of the 
most industrialized areas in the region of Asturias (NW Spain). Sampling allowed us to measure 
the local SSLs for several potentially toxic elements (PTEs). We then calculated a Soil Pollution 
Index (SPI) for both regional and local SSLs to assess the degree of contamination. Both pollution 
indicators were then subjected to a methodology based on a Bayesian network analysis, followed 
by a stochastic sequential Gaussian simulation approach. The combination of these approaches 
allowed us to visualize the differences in the determination of pollution risk areas, as well as to 
ascertain the effect of each PTE on the indexes, in function of the type of SSL chosen (regional or 
local). Our results revealed that, in areas where industry has a significant presence, local SSLs 
greatly facilitate the identification of polluted areas and also reduce the uncertainty associated with 
sampling density and diffuse pollution. On the basis of our findings, we conclude that local SSLs 
allow the identification of unpolluted areas that are potentially categorized as polluted when 
regional SSLs are used. 

1. Introduction 

Urban areas are a common source of PTEs. These contaminants enter the different environmental 
compartments, such as water, air and soils, and thus pose a risk for human health and the 
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environment (Jarup, 2003). The release of PTEs gains relevance when the study area is heavily 
industrialized (e.g. Zuo et al., 2017; Moreno et al., 2011). In these cases, the degree of PTE 
distribution is higher, thereby implying a larger area exposed to contaminants. 

One subject addressed in environmental science is the establishment of criteria to assess the degree 
of pollution in soils. Most methods used for this purpose often include the determination of 
threshold limits in terms of SSL assessment. When concentration levels exceed SSLs, further in- 
depth studies are required to accurately determine the extent of contamination (Antunes & 
Albuquerque, 2013) and/or to perform a site-specific risk assessment. If all the PTEs fall below 
the SSLs, no additional study is required (USEPA, 2007). 

There is some controversy with regard to the way SSLs are established. The values of these levels 
are commonly set on the basis of samples taken in extensive zones that are free of pollution. SSLs 
calculated by this procedure will not be suitable for study sites that are affected by multiple diffuse 
and sporadic PTE sources. Thus, the use of regional or even national SSLs could lead to failure to 
detect polluted sites in areas with complex sources of pollution and subsequent high soil 
contamination backgrounds. 

Here we sought to establish a contrast between local and regional SSLs, focusing on the 
identification of pollution anomalies in a paradigmatic industrial area (the municipality of Langreo 
in the region of Asturias, NW Spain). To this end, a regional soil sample database (Fernandez et 
al., 2018) used for the computation of the official SSLs of the Principality of Asturias (Spain), and 
a local database from Langreo (obtained in this work) were compared via mathematical, 
geostatistical and Bayesian machine learning approaches, with the aim to determine the effect of 
the factor scale on the delimitation of potential risk areas (Pereira and Soares, 2018). 

2. Materials and Methods 

2.1 Study area and data collection 

The study area of Langreo has a history of industrial and mining activities, some of which are still 
in operation. These activities have had a severe impact on the environment. The area is crossed by 
the Nalon River, the longest and most voluminous in the region. The area is relatively hilly, with 
altitudes from 200 m to 900 m, and it presents several narrow valleys distributed perpendicular to 
the main river. These features have produced an enclosed environment, which is susceptible to the 
accumulation of PTEs due to the geography impeding their migration. 

Sampling consisted of 10 equidistant transects (250 m wide and 1000 m apart), perpendicular to 
the Nalon river. A total of 150 samples were collected, the number per transect being determined 
proportionally to its length. The location of samples within transects was random. 
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Each sample was composed of five increases taken by means of an Edelman Auger in the vertexes 
of a 1-m edge square and its central point, from the first top 25 cm of soil. Samples were passed 
through a 2-cm mesh screen in situ to remove large material (e.g. organic matter, rocks, gravel). 
They were then dried at 35°C to prevent the evaporation of volatile PTEs and quartered by means 
of a Jones riffle splitter for soil homogenization and representativeness. Finally, they were ground 
in an RSI00 Resch mill at 400 RPM for 40 s and sent to the ISO-9002 and ISO-17025 accredited 
Bureau Veritas Laboratories (Vancouver, Canada), where total concentrations for 36 elements 
were determined by Inductively Coupled Plasma-Mass Spectrometry (ICP-MS) using the 
Ultratrace AQ250 analytical package. 

The database was simplified to 8 elements for the purpose of this study. These elements and their 
respective detection limits (ppm) were the following: As (0.1), Cd (0.01), Cu (0.01), Cr (0.5), Hg 
(0.005), Ni (0.1), Pb (0.01) and Zn (0.01). 

2.2 Calculation of Soil Screening Levels 

Outliers were selected by means of the percentile 99 of the Mahalanobis distance. General 
expression of the Mahalanobis distance between two objects is defined (Meeker et ah, 2017) as: 

d(Mahalanobis ) = [(X b - X a ) T ■ C~ x ■ (X b - X a )] 0 - 5 (1), 

where X a and Xb are the pair of objects and C is the sample covariance matrix. Up to three samples 
were excluded by this method. Thus, the final number of samples to obtain specific SSLs for 
Langreo (Local SSL) was 147 in approximately 82 km 2 . On the other hand, the SSLs for Asturias 
(regional SSLs for approx. 10,000 km 2 ) were determined from 334 samples in approximately 
10,000 km 2 from an official database. The mathematical expression for the regional/local SSL for 
element i is given by the following expression: 

SSLi(mg • kg -1 ) = C t + 2 • (2), 

where Q is the mean concentration and the standard deviation for all the samples considered in 
the dataset designed for SSL determination. 


Finally, the Soil Pollution Index (SPI) is a tool through which to identify subareas of higher or 
lower levels of pollution (Boente et al., 2017). The SPI for sample j is calculated as: 


SPIj 


, Cj 

'J_SSLj 

N 


(3), 
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where Q is the concentration of element /; SSL, is the Soil Screening Level (regional or local) of 
element i; and N is the constant for the number of pollutants considered: 8 in this study. 

2.3 Bayesian machine learning 

Recent advances in computer science offer the possibility to introduce machine learning, such as 
Bayesian networks (Benavoli et al., 2017). To determine the influence of the elements of interest 
on SPI results, we applied supervised learning to create a model characterizing a variable of interest 
(SPI), which represents the target node of the machine-learning process. Each variable was 
discretized to achieve a precise representation of the continuous values, thereby producing an 
accurate representation of the probabilistic dependencies related to the PTEs. The R2-GenOpt 
algorithm was used to generate the discretizations, as this genetic algorithm excels in supervised 
learning problems (Conrady and Jouffe, 2015) as R2 is maximized between the discretized 
variables. 


Once the Bayesian model had been built, contribution analysis was performed to determine the 
influence of each variable on the predicted mean of the target variable for each observation. This 
analytical tool uses the direct effects on the target to compute the contributions of all observable 
nodes. The direct effects are defined as the derivates of the direct effect functions: 


De x 



(4), 


where the direct effect functions (8) are estimated using a mean value analysis based on the 
MinXEnt (minimum cross-entropy) to go through the variation domain of a variable in order to 
measure its impact on the target mean. This procedure was developed via the software BayesiaLab 
v.7.0.1. 


2.4 Spatial modelling - geostatistical approach 

Geostatistical techniques are founded on the theory of regionalized variables (Matheron, 1971), 
which states that variables within an area show both random and spatially structured properties 
(Joumel and Huijbregts, 1978). Experimental variograms must be estimated and modelled to 
quantify the spatial variability of random variables as a function of their separation lag. The SPI 
used in this study was defined as a regionalized variable (Matheron, 1971) and consequently 
additive by construction, since the mean value within a given observed support is equal to the 
arithmetic average of the sample values, regardless of the statistical distribution of the values 
(Rivoirard, 2005). 

When predicting the SPI values, the delineation of zones of high and low risk requires the 
interpolation of these values to the nodes of a regular grid, thus facilitating proper risk assessment. 
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The subsequent geostatistical approach allowed the definition of the spatial distribution patterns 
for the two SPI sets (local and regional), focusing on the delineation of potential zones for future 
monitoring and/or remediation. 

Sequential Gaussian Simulation (SGS) was used as a stochastic simulation algorithm. SGS starts 
by defining the univariate distribution of values, performing a normal score transform of the 
original values to a standard normal distribution. Normal scores at grid node locations were 
simulated sequentially with simple kriging (SK) using the normal score data and a zero mean 
(Goovaerts, 1997). Once all normal scores had been simulated, they were back-transformed to 
original grade values. For the computation, the Space-Stat Software V. 4.0.18 (Biomedware) was 
used. A multiple sequence of simulation was applied to obtain more reliable probabilistic maps. 

3. Results and discussion 

The SSL results for the elements of the interest using the databases of Asturias (regional) and 
Langreo (local) are shown in Table 1. Only one element presented a similar content in regional 
and local soils (As). In contrast, various elements were more enriched in regional soils (Cd, Cr and 
Ni), while others were enriched in local ones (Cu, Hg, Pb and Zn), the latter having been reported 
as pollutants in the study area (Boente et al., 2018). The local SSLs highlighted several PTEs, 
while these compounds were smoothed when considering regional SSLs. In fact, notable changes 
in 7 out of 8 elements of interest reveal that the delimitation of pollution risk areas will vary 
depending on the SSL used. 

Table 1. Regional and local SSLs for the elements considered, and increase of the latter with 
respect to the former. 


Element 

Regional (ppm) 

Local (ppm) 

Increase (%) 

As 

39.21 

41.44 

5.69 

Cd 

1.70 

0.75 

-55.88 

Cu 

35.61 

62.12 

74.45 

Cr 

101.9 

28.91 

-71.63 

Hg 

0.49 

2.17 

342.86 

Ni 

41.31 

34.83 

-15.69 

Pb 

50.71 

142.29 

180.60 

Zn 

167.28 

251.81 

50.53 


Two supervised Bayesian networks were carried out, each establishing the regional and local SPIs 
as target nodes. Given the relatively small number of nodes and that the main objective of this 
study was to determine the impact of the PTEs on the regional and local indexes, a Naive Bayes 
algorithm was selected to build the models. This algorithm produces a straightforward structure to 
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establish the contribution of each element to the target node. Figure 1 shows the networks emerging 
from this process and the contributions obtained, which amount to 100%. 



Fig. 1 Contributions of each element to the regional (Asturias) and local (Langreo) SPIs . PTEs in 
green show a decrease in local contribution. 

The results reveal that the distribution of the contributions of each element to the SPIs is also 
altered when considering the factor scale. A great increase in the SSLs (e.g. Pb and Hg as shown 
in Table 1, but also As, which also increases) reduces the weight of the element on the SPI, that is 
to say, it smooths the effect of elements that are highly enriched. The use of regional SSLs, which 
are more restrictive, penalizes local anomalies, declaring almost the entire area polluted. On the 
other hand, the SPI constructed with local SSLs highlights polluted areas as only those that exceed 
the new SSLs; i.e., those soils which are highly polluted (Fig. 2). When setting high evidence 
P(Pb>67.2)=100%, which is the discretization associated with the higher levels found in the field, 
the inference result for the regional SPI reflects a clear dominance of Q4(>0.51), reaching almost 
98%. In contrast, for the local SPI, the probability values showed a more gradual distribution, 
reflecting a better match with the true field conditions. 


0 . 00 % 

0 . 00 % 

2.08% 

97.92% 


0 . 00 % 

0 . 00 % 

100 . 00 % 


SPI ASTURIAS (Regional) 
Mean: 0.755 Dev: 0.291 
Value: 0.755 (+0.138) 


SPI LANGREO (Local) 
Mean: 0.559 Dev: 0.178 
Value: 0.559 (+0.097) 



Q1 (<0.37) 

Q2 (0.37-0.42) 
Q3 (0.42-0.51) 
Q4 (>0.51) 


4.17% 

12 . 50 % 

35.42% 

47.92% 


Q1 (0.37) 

Q2 (0.37-0.42) 
Q3 (0.42-0.51) 
Q4 (>0.51) 


Pb 

Mean: 110.293 Dev: 53.592 
Value: 110.293 (+45.250) 

<43.27 


43.27-67.2 

>67.2 


Pb 

Mean: 110.293 Dev: 53.592 
Value: 110.293 (+45.250) 

0 . 00 % H <43.27 

0 . 00 % | 43.27-67.2 

100 . 00 %' >67.2 


Fig. 2 Inference results for regional and local SPIs when evidence is set for high concentrations of 
Pb (>67.2 ppm) 
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On the other hand, “neutral” elements, such as Ni, Cr and Cd, presented slight increases 
independently of their values in both SSLs. This observation can be explained by the fact that the 
SPI is influenced by the concentration of the PTE in soils. Wherever the PTE content is low, the 
Ci/SSLi coefficient will also be. 

In the geostatistical approach, the two SPIs went through SGS, and a hundred runs were performed 
for each one. The standard deviation was used as a measure of spatial uncertainty, setting aside the 
discussion of local accuracy. The comparison of the average images obtained for the two SPIs 
(Fig. 3 and Fig. 4) verifies that when using local SSL values, the exceeding clusters (red areas) 
diminish considerably, as did the associated spatial uncertainty. Conversely, when considering 
regional (restrictive) SSLs, almost the entire area is red, implying that it is highly polluted. This 
scenario changes when the local SSLs are considered. As the relevance of the main contaminants 
in the SPI decreased, variability (St.Dev) was therefore minor, and consequently the estimation is 
considerably more reliable, implying that potentially polluted areas are more recognizable and 
coincident with those reported in a previous study (the city and industrial cores, see Boente et al., 
2018). 



SGS.LAN 
Averagelmage_SPI_la 

H 0.51 to 1.16 

□ 0.42 to 0.51 

□ 0.37 to 0.42 
■ Oto 0.37 


0 


. 4 km 

a) 



H 

4 km 


SGS_LAN 

Uncertainty (St.Dev) 
■ 0.3 to 0.7 

□ 0.2 to 0.3 

□ 0.1 to 0.2 

□ oto 0.1 


Fig.3. a) Average Image - SPI computed using local SSL values; b Associated uncertainty - 
Standard Deviation of the 100 SGS. 
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□ 0 to 0.37 I I no to 0.1 

1 — 1 n a i 


0 


a) 


4 km 


0 


b) 


4 km 


Fig.4. a) Average Image - SPI computed using Regional SSL values; b Associated uncertainty - 
Standard Deviation of the 100 SGS. 

Furthermore, the variation between the two SPIs (Fig.5) reveals a marked increase (over 25%) in 
the northern zone of the study area. Thus, the highest variations were found in the city/industrial 
center and values decreased towards the south, where forests and natural soils are abundant. These 
findings demonstrate that regional SSLs are optimal only for the study of natural areas. 



N 


Fig.5. SPI variation (expressed in %) considering regional and local SSLs. 


4. Conclusions 

SSLs for PTEs are essential to establish threshold limits that allow the distinction between polluted 
and unpolluted soils. These values are commonly calculated in wide sampling areas (regional 
SSLs). The use of these levels may hinder the correct assessment of the degree of pollution in 
industrialized areas, where soils are affected by historical diffuse contamination. We propose that 
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local SSLs are the most appropriate method to properly identify pollution hot spots in 
urban/industrial areas. 

This study was performed in the industrialized municipality of Langreo, where the local SSLs were 
calculated and compared with existing regional SSLs by means of geostatistical and Bayesian 
analysis. The degree of pollution was established via a previously verified SPI. 

We observed that the use of local SSLs smoothed the geochemical anomalies, allowing a better 
discrimination of high risk clusters. Variations in the SPIs were more significant in areas that were 
more industrialized than in rural zones, where the regional and local SSLs were similar. 

On the basis of our findings, we conclude that the use of local SSLs, which are more representative 
of the soils in the study area, better identify the hot points and decrease the uncertainty of the 
geostatistical estimations. 
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Abstract 

Terrestrial radar interferometry (TRI) is a geodetic monitoring technique based on emission and 
reception of electromagnetic waves, whose potentially highly precise measurements in form of 
deformation maps are typically dominated by the effects of short- and long-term meteorological 
effects. In complex topographical or meteorological situations the so-called atmospheric phase 
screen (APS) often completely masks the real deformations, cannot be approximated sufficiently 
well with deterministic models and is best analysed stochastically. Due to its instationary nature 
exhibiting complex local autocorrelations, the APS is best handled by methods not operating under 
the assumption of second order stationarity. We take an approach based on optimization in 
reproducing kernel Hilbert spaces (RKHS) formally equivalent to various forms of Kriging to 
formulate the deformation estimation problem in TRI in terms of orthogonal projections. After 
discussing the suitability of our model and addressing computational issues, we briefly touch upon 
a first approach to practical implementation. The performance of the estimator in terms of root 
mean square error and convergence to ground truth is assessed based on a monitoring campaign 
carried out during several months in the Swiss alps. 

1. Terrestrial radar interferometry 

As the name implies, a terrestrial radar interferometer is a ground based instrument that emits 
coherent radar waves towards a region of interest - typically areas subjected to geohazards like 
ice- and rockfall or landslides - and uses the received echoes to survey and monitor the geometry 
of that area. The instruments available on the market have a resolution in propagation direction 
and a distance dependent resolution orthogonally to it. The final outcome of a measurement is a 
complex matrix in which a specific entry's row- and column-index correspond to its polar 
coordinates with the radar instrument's position as the origin and a reference direction depending 
on the setup of the instrument. Each pixel in the matrix is a complex number encoding amplitude 
and phase information of the signal backscattered from the area corresponding to that pixel. While 
the amplitude is determined mostly by geometric and physical properties of the reflecting surface, 
the phase is - modulo the wavelength A «17 mm - proportional to a weighted mean distance 
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between instrument and surface. Phase differences in two subsequent measurements correspond 
to changes of this weighted mean distance, and in the absence of noise imply deformations in the 
direction of the instrument's line-of-sight. References listing these and other fundamental facts 
about radar interferometry are e.g., the monographs by Hanssen (2006) and Rodelsperger (2011). 
Figure 1 plots exemplary main outputs of measuring with TRI. 
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Fig. 1 The amplitude, phase and phase difference between two subsequent measurements taken 
two minutes apart are projected onto the underlying digital elevation model. The amplitude is 
plotted in arbitrary units with brighter colours corresponding to higher amplitudes. Note that the 
phase almost looks like spatial uniform noise on [-7i,7t] as its formation depends primarily on the 
local structure of the backscattering surface. Where the signal to noise ratio is bad, only the 
background DEM is shown. In the phase difference plot, In rad = 3m/day velocity. 

The object of investigation for the data plotted in figure 1 was a fast moving glacier whose 
boundary is outlined in black. The rest of the terrain may be assumed stable. The non-zero phase 
differences - supposedly corresponding directly to surface changes - however are not restricted to 
the glacier area. These spurious deformations are mostly due to the regimes of differing 
atmospheric conditions through which the radar waves propagate. This explains the highly 
autocorrelated nature of these artifacts and a variability that increases with distance from the 
instrument. 

As Butt et al. (2017) show, this so-called atmospheric phase screen (APS) evolves dynamically in 
time in a way that precludes forward modelling on the basis of the differential equations governing 
the system. For monitoring in mountainous areas over long distances and under challenging. 
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rapidly varying atmospheric conditions, separating noise, APS and deformations as much data- 
driven as possible is a key step in practical applications of TRI and has been addressed by different 
authors. See for example (Noferini et al. 2005; Iglesias et al. 2014) and (Caduff et al. 2014) for 
atmospheric models with a decreasing amount of regularity assumptions on the APS. We follow 
the trend and take a purely stochastic perspective: deformations, APS and noise are all 
spatiotemporal random fields of which we measure but some of their values and whose 
constituents are to be estimated such that they are sufficiently likely given some prior assumptions 
on their correlation structure. This leads to a nonparametric spline function whose connections to 
geostatistical procedures and underlying assumptions will be cleared up in the next section. 

2. Statistical inference with splines 

Let J.'Tx Q a (t, of) a X'" e \ and N'. :Tx Cl a (t, of) a JV"e j be two independent mean- 
zero Gaussian processes on T with each of the X\ and N‘ : Q —» i square integrable random 
variables on the probability space Q. Denote the covariance functions by 

Co v(X s ,X t ) = K x (s,t) = E[X s X t ] 
and Co v(N s ,N t ) = K N (s,t) = E[N s N t ] 

respectively. In the above formula £[•] is the expectation operator, and we drop explicitly 
expressing the dependence of the random variables on Q when no confusion will arise. Suppose 
some noisy observations {y k } n k=l a \ are given that are realizations of the stochastic process 

r u‘ x , +N '. d.l) 

on {t v ...,t n }, where {t k } n k=l c T is a sequence of positions in T at which A. and N . are evaluated 
and N represents noise to be eliminated from observation. Under a joint multivariate Gaussian 

n 

hypothesis, the best predictor of X t given Y := {Y tt } n k=l is that particular X t = t {t)Y t which is 

i =1 

the conditional expectation E[X t \ Y h ,...,Y t ] . Since Y = [I I~\[X N] T where A~N (0,E W ) , 
A ~ N (0,S W ) with (X w . ), y =K x (t i ,t J ) , (S AW ) ;/ = A' v (/,,/, ) and I the identity matrix, we have 
Y ~ N (0, + E ;VV ) and consequently 
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where Z xx =K x (t,t ) and E xx = [K x (t,t l ),...,K x (t,tJ] which implies for a particular Y = y the 
conditional distribution (Press 2012, p. 73) 


^ I ^ ~ N (£ x ,jsr( 2 jar + 2 m ) X y,^x t x, -^x.x&xx +2 aw) l ^xx,) ■ O' 2 ) 

Therefore the best predictor equals the best linear unbiased predictor 

X=[K x {t,tX-.,K x {t,t n )][^xx +^ w r 1 T (1.3) 

and minimizes the expected square error E[$ t - X t ) 2 \ locally for all teT (Berlinet and 

Thomas-Agnan 2011, p.76). As a function of is also the minimizer (Bezhaev and Vasilenko 
2013, p. 161) 


= argmin II Ax t —y II \ + II x t II 2 n 

x ,£ u k x 


(1.4) 


where A : a x t a [x t ,...,x t ]' e ; " is called a measurement operator, II -ll z is induced by 

</', g> Ivv = (2 N N f,g) n and H A . is the reproducing kernel Hilbert space (RKHS) of functions on 
T with the inner product 


00 ] 

(f’g) ft K “X 1 ,( Pj} L 2 (T) 

j =1 'b 

for all f,g e H A based on the Mercer decomposition (Nashed and Wahba 1974) 


K x (s,t) = Y J ^ l (p j ( s )<P l ( t ) • (1-5) 

7=1 

Optimizers in RKHS of the type encountered in equation (1.4) are called splines, and their 
similarity and - under certain conditions - equivalence to Kriging are well known (Berlinet and 
Thomas-Agnan 2011, p.88). Optimal estimation in the least mean-square sense can therefore be 
performed by solving a global optimization problem in function space instead. The general abstract 
spline problem is 
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x,eH 

H,A,B Hilbert spaces 

A : H —> A the bounded linear measurement operator 
B : H —> B the bounded linear energy operator 

for which solutions as well as existence and uniqueness conditions are provided in (Berlinet and 
Thomas-Agnan 2011, pp. 116-117). With this more complicated framework we can easily 
accommodate correlation structure introduced by the geometry of the measurements in a natural 
way. Suppose t 0 e j 3 are the geometric coordinates of the radar interferometer and {t k } n k=x a j 3 
are the coordinates of the backscattering points while X t is a three-dimensional stochastic process 
representing (differential) atmospheric effects. Then the measurement operator 

A:X, a l£ =o X(t 0 +s(t k -t 0 )) II t k -t 0 II ds} n k=] =: {£ X t dt} n k=x e j " (1.7) 

interpreted as a line integral, represents the process by which an electromagnetic wave accumulates 
atmospherically induced deviations while travelling along its propagation path. Given any prior 
covariance function K x over X. , the measurement operator induces the covariance function 

K AX (s,t) = A®AK x = f f K x (u,v)dudv (1.8) 

Jt 0 Jt 0 

and relates the covariance of a hypothetical unobservable atmospheric field of differential effects 
to the covariance of line integrals through it - the latter being a model for the actually observed 
APS. We use this in the following signal separation model for deformation estimation in TRI in 

ft(Pi) rt(p 2 ) 

which we define K AX (p v p 2 ) = " K x (u,v)dudv for interferogram pixels p with t(p) the 

spatial coordinates associated to this pixel. 

The measurements Y , deformation D and APS P are Gaussian processes with D independent 
P on the index set T = j 3 factoring as T = T x xT 2 with 7j = j 2 pixel coordinates and T 2 = j 
time. We propose 
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( 1 . 6 ) 


y;=d;+p; (1.9) 

where D[ represents deformation with spatiotemporal covariance function K D (-,•) assumed 
known or inferable and P‘ represents the APS with spatiotemporal covariance function 
K p (-, •) = A ® AK X assumed unknown and to be inferred. The operator A is as in equation (1.7) . 
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The estimator for D at pixel position and time t e T based on concrete phase measurements 
ye j "is given as (Bezhaev and Vasilenko 2013, p. 161) 

d (•) = argmin II Ad -y II \ + II d II „ 

deR K D PP K ° 

( 1 . 10 ) 

=1A=(x DD +x PP y l y 

7=1 

where A is evaluation at {t k }" =l =K p (t i ,t j ) and (Z DD ) V = K D (t i ,t j ) ■ 

3. Computational aspects 

Expressions of type (1.10) are hard to evaluate due to the cost of inverting ~L PP + Z DD - an 

operation the effort for which scales as « with the number of points involved. Our measurements 
have a spatial and temporal component and if we denote the RKHS of spatiotemporal 
measurements y as H® with kernel 

k? = k;, ® k‘ y k s y =k s d +k; k^k^+k'p (2.1) 

then we may choose to factor H ® as 


h® = h;®h;. (2.2) 

In the above, ® signifies the usual tensor product of Hilbert spaces or of elements therein 
(Aronszajn 1950), the subscripts Y,D,P indicate whether the Hilbert space or kernel is associated 
with measurements, deformations or APS whereas the superscripts s and t signify spatial or 
temporal dimension. With some straightforward computation, one sees that the solutions to the 
two separate problems 


d s, ‘(-) = argmin II Ax-y s,t \\ , + II x 


(2.3) 


xeHi/ 


ar e(s,t being a placeholder meaning either s or t with st indicating the spatiotemporal case) 


d°'\ -)=Z^(’^)^ 

7=1 


ns,t i X'SJ \-l 

A =\ l dd +l pp) y 


(2.4) 


and we may write the predictions d s ' = [d s, ‘(t n )] as 
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'l S,t S,t f^S,t . ^S,t \-l S,t r—'S,t S,t 

a — ^DD \^DD ^PP ) y ~ y 


respectively. The tensor product of the solution operators is 


/On 


(2.5) 


( 2 . 6 ) 


It is computationally easier to handle since the matrices to be inverted are significantly smaller and 
the predictions H® y st are the evaluations of the spatiotemporal estimate d 0 of the deformation 
which is the orthogonal projection onto H ® of an interpolating spline in H ® . The outcome 
of the calculations is that part of the observed spatiotemporal field, which exhibits correlation 
structure as prescribed by K‘ D in time and K s n in space. For more details, see the chapter on tensor 

splines in (Bezhaev and Vasilenko 2011, pp. 175-187). It may furthermore be possible to 
approximate the double integral kernels introduced in equation (1.8) numerically more efficiently 
than previously done in (Butt et al. 2016) however, we leave this for future work and instead make 
local use of stationary kernels and a partition of unity. 

5. Practical implementation and results 

For a first implementation, we replace the formally more correct model stated in equation (1.10) 
by exchanging the instationary double-integral term K p (-,-) by a stationary kernel to enable 

efficient calculation of the covariance matrices and tensor product factorizations. It is then possible 
to directly use equation (2.6), although it is advisable to restrict the dataset used for prediction, e.g. 
by not using all measurements simultaneously but apply the estimation formulas for a sequence of 
overlapping windows. The separate predictors are combined in a weighted average with the 
weights coming from a partition of unity. Many measurements are subject to high levels of 
uncorrelated noise; we summarily discard measurements which do not fulfill a minimum 
requirement on the signal-to-noise ratio as for example quantified with help of the amplitude 
dispersion index (Ferretti et al. 2001). The covariance function best suited to model the APS is 
inferred via brute force on the parameters of three parametric covariance functions (power law, 

exponential, squared exponential) and the combination minimizing II E emp - ~E g II p was chosen. 
Here £ emp is the empirical covariance matrix of phase differences on stable points, T, e is the 
covariance matrix predicted by the covariance function with parameters 0 on the same points and 
II • II F is the usual Frobenius norm. Based on prior knowledge, we prescribed the spatiotemporal 
structure of the deformation phenomena to be ragged in space and smooth in time by defining 
K ® = K S D 0 K‘ D with K' d an exponential and K' D a squared exponential kernel. 
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The algorithm just outlined was applied to a challenging dataset featuring mountainous terrain, 
distances of up to 8 km between radar and farthest point of interest, and the highly variable 
atmospheric dynamics already mentioned and illustrated in section 1. The data consist of a 
sequence of 720 interferograms of size approximately 1300 x 250 Pixels spanning the duration of 
one day, i.e. every interferogram details phase changes between measurements 2 minutes apart. 
For three different interferograms in the time series, the predicted APS and deformations are 
plotted in figure 2. 


Interferogram at 12:00 p.m. 




Estimated APS at 12:00 p.m. 


Estimated deformations at 12:00 p.m. 
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Fig. 2 The first row consists of three 2-minute interferograms formed at three different times 10 
minutes apart. The second row exhibits the estimated atmospheric phase screen whereas the third 
row shows the final deformation estimates. Note that in reality, deformations are limited to the 
region outlined by the black boundary. This is consistent with the output of the algorithm. The area 
outlined in red is known to be stable and used for validation purposes. 

We document in table 1 the algorithm's performance compared to - apart from the trivial option of 
not filtering at all - two approaches that operate separately in time and space. One is simple 
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stacking and the other one the fitting of second order polynomials to model the APS (Noferini et 
al. 2001). 


Tab. 1 The root mean square error (RMSE) in m/day of the predictions with different atmospheric correction methods 
dependent on the amount of interferograms used for one specific (but not atypical) day (see text for further details). 


Method) # Intfs. 

1 

4 

16 

64 

256 

720 

Trivial 

0.48 | 0.48 

0.59 | 0.59 

0.84 | 0.84 

0.73 | 0.73 

1.36 | 1.36 

1.58 | 1.58 

Stacking 

0.48 | 0.48 

0.33 | 0.38 

0.60 | 0.66 

0.35 | 0.43 

0.29 | 0.39 

0.24 | 0.36 

Polynomial 

0.42 | 0.48 

0.45 | 0.47 

0.57 | 0.59 

0 . 57 | 0.57 

0 . 88 | 0.89 

0.91 | 0.93 

RKHS 

0.13 | 0.29 

0.12 | 0.26 

0.11 0.31 

0.12 | 0.25 

0.09 | 0.16 

0.09 | 0.12 


Estimation of deformations was performed in a 200 x 50 Pixel window (marked in red in figure 
2) based on 1,4, 16, 64, 256 and the full set of 720 interferograms and compared to the ground 
truth. The left number in each of table l's cells is the RMSE in the case where the ground truth is 
0 m/day (area is known to be relatively stable) whereas the right number is the RMSE in case of a 
synthetic nonzero ground truth. To generate the latter data, we added a simulated spatiotemporal 
random field with correlation structure K ® to the measurements gathered on the stable area. As the 

temporal correlation lengths of the APS in our data are small, the RKHS predictions behave 
similarly to the ones made by stacking as both act essentially as low-pass filters in time. We 
critically remark that including prior knowledge is crucial to enhance the capability of the 
algorithm - it hinges on the correlation structure for the deformation being correctly specified by 
the user. It furthermore benefits from information about stable areas, as it can be used to first 
estimate the APS, subtract it from the data and perform filtering on the residuals so obtained. The 
results indicate that both temporal and spatial characteristics are important for signal separation 
and that a spatiotemporal approach reduces the RMSE of deformation estimations significantly. 

6. Outlook and conclusion 

We presented a stochastic model for measurements in terrestrial radar interferometry that allows 
to pose the signal separation problem of optimally inferring deformations from radar data 
adversely affected by atmospheric artifacts as energy minimization problems, whose solutions are 
splines expressible in closed form. These closed form solutions, however, turned out to be more 
of theoretical value as their implementation and evaluation is associated with a prohibitively heavy 
computational cost. This motivated us to exchange our physically justified instationary stochastic 
model for a similar stationary one that enabled easy tensor product factorization of the involved 
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matrices and simplified the inference procedure for the spatiotemporal case enough to make it 
practically feasible. We remark that our algorithm combines knowledge about spatial and temporal 
correlation of the deformation phenomena to be extracted from measurements with data-driven 
knowledge about the APS' correlation structure to derive a best guess respecting the characteristics 
of both deformation and APS in space and time. In future work we will aim towards making the 
formal stochastic model based on modelling the APS as coming from line integration through a 
three-dimensional random field accessible to spatiotemporal inference as well. 
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Abstract 

In Geostatistics, vectorial data in two dimensions such as measures for wind field, electro¬ 
magnetic field and ocean currents can be appropriately modelled by recalling the theory of 
complex-valued random field. 

This is suitable to describe phenomena whose components are characterized by homogeneous 
quantities, with the same unit of measurement. 

After reviewing the fundamental aspects of complex formalism and the procedure for estimating 
and fitting a complex covariance function, a new model suitable to include the temporal component 
is proposed. 

A case study on a vectorial data set regarding surface ocean currents is provided. In particular, 
these data, derived from high frequency (HF) radar systems (source: National Oceanographic Data 
Center into the National Centers for Environmental Information), have been measured during the 
30th of April 2016, at 207 stations along the US East and Gulf Coast. The related computational 
aspects have been carried out by using the cgeostat software. 

Keywords: vectorial data, spatio-temporal covariance 

1. Introduction 

Although vectorial data in two dimensions can be treated by applying the theory of multivariate 
random functions, the use of the framework of complex-valued random fields is particularly 
appropriate when the components of a vectorial random field describe homogeneous entities of the 
same physical phenomenon characterized by the same units of measure, such as wind speed, sea 
current, force, electric or magnetic field. 

The theory of complex-valued random fields in Geostatistics was introduced in Wackemagel 
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(2003), where just a few pages were devoted to prediction aspects; on the other hand, in De Iaco 
et al. (2003) only some problems related to the construction of new complex covariance models 
were faced and some examples of complex models were furnished. It is worth noting that just some 
basic concepts on the theory of complex stochastic processes (not random fields) were also 
discussed in Yaglom (1986). 

Some estimating and modeling aspects of cross relationships among the components of a complex¬ 
valued random field were recently explored in De Iaco et al. (2013) as well as some computational 
aspects and an environmental application can be found in De Iaco and Posa (2016). The cgeostat 
software for analyzing complex-valued random fields is also available (De Iaco, 2017). 

However, none of the above contributions have considered how the temporal dimension can be 
treated and the effects on the complex covariance models. Thus, the novelty of this paper concerns: 
(i) the definition of complex-valued random fields indexed in time and the proposed complex 
covariance; (ii) a suitable case study regarding ocean currents measured in space-time. In this 
paper, after introducing the framework of complex-valued random fields indexed in time, a 
reasonable complex covariance model has been presented and the procedure that can be applied to 
fit the model has been discussed (Sect. 2). An application for ocean current data, collected at some 
monitoring stations located along the US East and Gulf Coast and for different hours, has been 
provided. The proposed model has been fitted and used for prediction purposed with satisfactory 
results (Sect. 3). These last results are very useful to predict the transport of sediments or pollutants 
and to investigate how the ocean system might react to external changes such as the global climate 
change. 

2. Complex-valued random fields indexed in time 

A complex-valued random field is defined on the complex plane C as 

Z t ( u) =X t (u)+i U(u), u t£M, M 

where the components X f and 7? of Z? are real-valued random fields indexed in time and i =V—1. 
For example, in the case of ocean currents, X t and Y, represent the components, for the time point 
t, of the vectorial decomposition of the current along two arbitrary axes. 

As in the real case, a complex-valued random field indexed in time is second-order stationary in 
space, if the expected value, E[Z t (u)] = mt = mxt + i m Yt > with mxt = E\Xt (u)], myt = E[Y t (u)], 

for any u of the domain and the covariance function R depends, for any u and u, on the lag separation 
vector h s = u- u eE^, that is 
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Rt (hi) = R/' e (h s ) + i R t im { hi). 


( 2 ) 


where R t re (\is) = /?»(h 5 )+/?»(h s ) and J?/ m (h,) = R YlXt (hs)-RxtY, (h,). 


For second-order stationary complex-valued random fields Z t , the Hermitian form of non-negative 
definiteness condition simplifies to 


V " 1 


2^2-t Rt ^ Ui ~ u J^ a i a j - °- 


i =1 i =1 


By using the approach in De Iaco et al. (2003), a family of space-time complex covariance 
functions can be defined as follows 


R t (h, v ; c,) = cos(h, • c t )R t (h v )+i sin(hv C,)^(hv) ( 3 ) 

where R t (hi) is a real spatial covariance function indexed in time and c, G M N , t G M the shifting 
factor. 


Looking simultaneously to Eq. (2) and Eq. (3), it is worth pointing out that the real part ms’(h. s C/)/?, 
(hi) describes the sum of the covariances Rxt (h s ) and Ryi (h*), while the imaginary part 
sin(h,xCt)Rt (hi) is a model for the difference of the two cross-covariances Rnxt and Rxm- 
Moreover, a discussion on the effect of the shifting factor can be found in De Iaco et al. (2013). Note 
that the shifting factor is assumed to be time-dependent and the complex covariance function 
stationary in space. 

This model can be justified when it is reasonable to foresee a common spatial structural component 
over time and a systematic temporal component that influences the shifting factor and 
consequently the delay effect of the X t component with respect to the Y t or a delay effect of the Y t 
component with respect to the X,. In particular, each component (c t , i, c t , 2 , ..., c t ,N) of the shifting 
factor c t might be modeled through a deterministic function or a function with stochastic 
parameters. 

From a computational point of view, first of all the user has to compute the empirical direct and 
cross-covariances of the vector components for different lags and directions of the spatial domain 
as well as for different time points, then the sum of the sample direct covariances and the difference 
of the sample cross-covariances for relevant directions. Consequently, fit the parameter vector c t 
jointly to the sum of the sample direct covariances and the difference of the sample cross¬ 
covariances determined for different temporal points and select a common anisotropic spatial 
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covariance model R t (•;&), with parameter vector 6 t , and fit it to the sum of the sample direct 
covariances and the difference of the sample cross-covariances for each time point. Regarding 
prediction, different forms of complex kriging have been proposed in the literature. Some 
properties of various complex predictors are discussed in De Iaco et al. (2013) and the recent 
software cgeostat can be applied to analyze phenomena with a representation on a complex domain 
(De Iaco, 2017). 

3. Case Study 

The theory of complex valued random fields has been applied to analyze ocean currents 
measurements, collected during the 30th of April 2016 (at different hours 00:00, 03:00, 06:00, 
09:00, 12:00, 15:00, 18:00, 21:00) in 207 monitoring stations located along the US East and Gulf 
Coast (https://www.nodc.noaa.gov/gocd/hfportal.html) . 

Fig. 1 shows the 207 monitoring stations regularly distributed over a (2 x 2 km cell size) grid 
(black points) and the 184 regularly distributed over a (6 X 6 km cell size) coarse grid (red points). 
Note that data available for the coarser grid have been utilized for prediction purposes, as 
highlighted in Sect. 3.3. 

The available data have been reasonably considered as a realization of a complex-valued random 
field Z t indexed in time, defined as in (1), where X, and Y t denote, respectively, the ocean currents 
speed and the ocean currents directions at time point t. In the next sections, structural analysis has 
been proposed and kriging predictions have been computed. Predicted values have been com¬ 
pared with the observed ones, as specified hereafter. 
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in-situ monitoring stations (6x6 km)-grid 
in-situ monitoring stations (2x2 km)-grid 


Fig. 1. Location maps of ocean currents in the US East and Gulf Coast at hours: a) 00:00, b) 03:00, 
c)06:00, d) 09:00, e) 12:00, f) 15:00, g) 18:00, h) 21:00 

3.1 Structural analysis 

Before fitting the complex covariance model in (3), an inspection of the structural characteristics 
of the evolution of the components of ocean currents has been carried out. 

In particular, structural analysis of complex random fields in space-time, when model (3) is 
considered,has required: 

• the estimation of the directional covariance functions for X t and Y t and the corresponding 
cross-covariances for different spatial lags and for the selected hours, 

• the computation of the components of the complex covariance functions, i.e. the sum of 
the A and Y directional covariates (/?A 7 (h)+/?;v (h)) and the difference between directional 
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cross-covariance (i?yt»(h)-i?vfFf(h)), 

• the estimation of the shifting parameters c t , 1 and c t , 2 , fitted through least squares. 

Taking into account the results of the structural analysis for the analyzed hours the presence of 
geometric anisotropy has been detected and the Gaussian covariance function has been selected in 
order to model R t (-) in (3). Sample real and imaginary components and their models for the 
complex covariances, referred to 4 directions (N, E, NE, SE) and 8 time points (hours) are shown 
in Fig. 4. 

Moreover, in Tab. 1 the estimated parameters for the Gaussian covariance models R t (•) and 
the shifting factors, for the selected hours t are provided. 





b) 


c) 


d) 





g) 


e) 


h) 


Modulus -0.01 -* 1.00 


Fig. 2. Location maps of ocean currents in the US East and Gulf Coast at hours: a) 00:00, b) 03:00, 
c) 06:00, d) 09:00, e) 12:00, f) 15:00, g) 18:00, h) 21:00 
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Table 1. Estimated parameters for the Gaussian covariance models R t and the shifting 
factors, for the selected hours t 



R t parameters 

Shifting factors 

t 

sill 

max range 

min range 

c t, 1 

C t,2 

(hour) 


(direction) 

(direction) 



00:00 

0.017 

0.700 (N) 

0.300 (E) 

8.621 

-3.899 

03:00 

0.006 

0.280 (NE) 

0.200 (SE) 

1.982 

1.683 

06:00 

0.016 

0.540 (E) 

0.370 (N) 

-7.685 

-3.203 

09:00 

0.017 

0.340 (SE) 

0.170 (NE) 

11.267 

2.027 

12:00 

0.017 

0.500 (N) 

0.300 (E) 

-11.702 

-7.243 

15:00 

0.220 

0.270 (NE) 

0.260 (SE) 

-12.838 

1.706 

18:00 

0.027 

0.330(E) 

0.320 (N) 

6.916 

-0.451 

21:00 

0.040 

0.330 (SE) 

0.320 (NE) 

-3.161 

-2.960 


As highlighted in Fig. 3, the components of the shifting factor are characterized by a clear periodic 
behavior, which effect on both the even and on the odd part of the complex covariance function. It 
is useful to recall that the sign of the shifting components is strictly related to a delay effect of the 
X t component with respect to the Y, or a delay effect of the Y, component with respect to the X t ; as 
a consequence the imaginary part of the complex covariance function might be described by an odd 
function which is convex or concave, respectively, for positive lags. As discussed in De Iaco et al. 
(2013) 

(i) ifthe components c t ,\ andc ,2 ofthe shifting vector are both positive, the imaginary part ofthe 
covariance function assumes increasing positive values in a neighborhood of the right-hand 
side of zero. This might be associated with an effect of delay (similar to the delay effect in 
time series) ofX t with respect to Y, along specific directions; in other words this means that for 
some lags h v the variable Y t is a leading indicator of the variable X t . For example, it might 
happen that great (low) values of the Y t component of the complex variable at a location u 
imply great (low) values of the X t component of the vector at a location (u + h v ); 

(ii) ifthe components c t ,\ and ct,i ofthe shifting vector are both negative, the imaginary part ofthe 
covariance function assumes decreasing negative values in a neighborhood of the right-hand 
side of zero. This might be associated with an effect of delay (similar to the delay effect in 
time series) of Twith respect to A) along specific directions. For example, it might happen that 
great (low) values of the X, component of the vector at a location u imply great (low) values 
of the Y t component of the vector at a location (u + h v ); 

(iii) if the components c t , i and c t , 2 ofthe shifting vector have different sign, the imaginary part of the 
covariance function can assume increasing positive values or decreasing negative values in a 
neighborhood of the right-hand side of zero, according to the magnitude of the shifting vector. 
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Fig. 3 : fitted values of the shifting factors ct,l and ct,2. 

The complex covariance models, with the parameters specified in tab. 1, have been cross-validated 
on the basis of the sample data observed at the 207 locations and for different hours. The statistics 
on cross-validation results, shown in tab. 2, encourage the use of the complex kriging for ocean 
currents predictions. Note that both the mean absolute error and the root mean square error related 
to the estimated values obtained by the complex kriging are very low. Moreover, statistical tests 
are conducted in order to verify that the differences between the means of the true values and the 
estimated ones for the components xt and yt are nil. The p values of the test statistics indicate that 
the null hypothesis is retained, hence the complex covariance function can be considered 
appropriate for modeling the spatial correlation of the analyzed vectorial components. 


Table 2. some statistics on cross-validation estimations for hours 00:00, 03:00, 06:00, 09:00, 
12:00, 15:00, 18:00, 21:00, over a (2x 2 km cell size)-grid 



hour 00:00 

hour 03:00 

hour 06:00 

hour 09:00 


hour 1 

12:00 



hour 1 

15:00 



hour 

18:00 


hour 21:00 

Cross-validation 

X 


Y 


X 

1 


A 


Y 


X 


Y 


A 


y 


X 


y 


A 


y 


X 


y 


statistics 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

Num. of data 

204 

204 

204 

204 

200 

200 

200 

200 

195 

195 

195 

195 

192 

192 

192 

192 

207 

207 

207 

207 

206 

206 

206 

206 

192 

192 

192 

192 

201 

201 

201 

201 

Mean 

-0.15 

-0.15 

0.12 

0.12 

-0.07 

-0.07 

0.14 

0.14 

-0.12 

-0.11 

0.11 

0.11 

-0.24 

-0.24 

-0.13 

-0.13 

-0.14 

-0.14 

0.13 

0.13 

0.01 

0.01 

0.32 

0.32 

-0.02 

-0.02 

0.10 

0.11 

-0.08 

-0.07 

-0.10 

-0.10 

Std. dev. 

0.10 

0.11 

0.11 

0.17 

0.07 

0.07 

0.04 

0.04 

0.11 

0.21 

0.10 

0.16 

0.13 

0.13 

0.07 

0.09 

0.11 

0.12 

0.11 

0.18 

0.11 

0.11 

0.14 

0.15 

0.09 

0.11 

0.16 

0.17 

0.14 

0.14 

0.16 

0.18 

Std. err. 

0.01 

0.01 

0.01 

0.01 

0.01 

0.00 

0.00 

0.00 

0.01 

0.02 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

Minimum 

-0.33 

-0.45 

-0.16 

-0.71 

-0.21 

-0.22 

0.00 

0.00 

-0.33 

-1.67 

-0.48 

-0.99 

-0.45 

-0.54 

-0.40 

-0.40 

-0.33 

-0.61 

-0.21 

-0.82 

-0.35 

-0.35 

-0.10 

-0.10 

-0.27 

-0.58 

-0.36 

-0.33 

-0.31 

-0.32 

-0.41 

-0.70 

Maximum 

0.15 

0.27 

0.48 

1.36 

0.11 

0.10 

0.25 

0.24 

0.14 

1.29 

0.17 

1.41 

0.17 

0.10 

0.14 

0.33 

0.14 

0.26 

0.43 

1.36 

0.68 

0.63 

0.63 

0.69 

0.29 

0.31 

0.54 

0.72 

0.40 

0.33 

0.36 

0.62 

Mean difference test 

































Hypoth. difference 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 

p value 


0.88 


0.91 


1.00 


0.84 


0.60 


0.99 


1.00 


0.82 


0.89 


0.70 


0.86 


0.92 


0.76 


0.83 


0.91 


0.95 

MAE 


0.02 


0.05 


0.02 


0.02 


0.05 


0.05 


0.03 


0.03 


0.03 


0.06 


0.03 


0.06 


0.03 


0.04 


0.03 


0.05 

RMSE 


0.04 


0.12 


0.03 


0.04 


0.17 


0.14 


0.05 


0.05 


0.06 


0.14 


0.04 


0.08 


0.06 


0.06 


0.04 


0.09 
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Fig. 4. real part of the sample complex covariance function (*) and its model (_) with the 
corresponding imaginary part ( ,) and its model ( ) for different directions at hours: a) 00:00, b) 
03:00, c) 06:00, d) 09:00, e) 12:00, f) 15:00, g) 18:00, h) 21:00 
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3.2 Predictions 

Complex kriging has been applied to estimate the ocean currents over the spatial domain covered 
by the red points in Fig. 1 and for 8 hours. For this aim, the complex covariance model selected on 
the basis of structural analysis conducted for data available over the smaller domain covered by the 
black sampled points in Fig. 1 has been used. This choice is consistent by taking into account the 
goodness of cross-validation results (Tab. 3) obtained over the extended domain, for the selected 
hours. In other terms, this means that the complex covariance model related to the small domain 
can be adopted to describe the complex covariance function over the wider domain. 


Table 3. Some statistics on cross-validation estimations for hours 00:00, 03:00, 06:00, 09:00, 
12:00, 15:00, 18:00,21:00 over a (6x6 km cell size)-grid 



hour 00:00 

hour 03:00 

hour 06:00 

hour 09:00 

hour 12:00 


hour 

5:00 



hour 

8:00 



hour 

-1:00 


Cross-validation 

X 


y 


X 


y 


X 


y 


X 


y 


A 


y 


A 


y 


X 


y 


X 


y 


statistics 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

True 

Est. 

Num. of data 

161 

161 

161 

161 

169 

169 

169 

169 

160 

160 

160 

160 

167 

167 

167 

167 

171 

171 

171 

171 

184 

184 

184 

184 

179 

179 

179 

179 

173 

173 

173 

173 

Mean 

-0.10 

-0.10 

0.05 

0.05 

-0.05 

-0.05 

0.09 

0.09 

-0.02 

-0.02 

-0.07 

-0.06 

-0.11 

-0.10 

-0.11 

-0.10 

-0.10 

-0.10 

0.05 

0.05 

0.01 

0.01 

0.18 

0.18 

0.05 

0.05 

0.07 

0.07 

-0.01 

-0.01 

-0.01 

-0.02 

Std. dev. 

0.12 

0.12 

0.12 

0.13 

0.09 

0.09 

0.08 

0.07 

0.11 

0.11 

0.11 

0.11 

0.14 

0.14 

0.12 

0.10 

0.08 

0.09 

0.12 

0.11 

0.11 

0.11 

0.15 

0.15 

0.12 

0.12 

0.15 

0.15 

0.14 

0.14 

0.20 

0.20 

Std. err. 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.02 

0.02 

Minimum 

-0.37 

-0.43 

-0.41 

-0.46 

-0.27 

-0.26 

-0.16 

-0.08 

-0.27 

-0.27 

-0.40 

-0.34 

-0.45 

-0.46 

-0.57 

-0.39 

-0.27 

-0.44 

-0.30 

-0.36 

-0.28 

-0.25 

-0.24 

-0.17 

-0.22 

-0.19 

-0.45 

-0.41 

-0.38 

-0.31 

-0.58 

-0.61 

Maximum 

0.18 

0.19 

0.31 

0.43 

0.22 

0.21 

0.43 

0.33 

0.29 

0.28 

0.13 

0.24 

0.16 

0.12 

0.30 

0.22 

0.08 

0.08 

0.29 

0.27 

0.38 

0.34 

0.55 

0.58 

0.34 

0.32 

0.36 

0.39 

0.26 

0.25 

0.32 

0.34 

Mean difference test 

































Hypoth. difference 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 

p value 


0.93 


0.81 


0.93 


0.92 


0.84 


0.86 


0.99 


0.79 


0.85 


0.93 


0.95 


0.82 


0.98 


0.93 


0.90 


0.91 

MAE 


0.02 


0.03 


0.02 


0.02 


0.02 


0.03 


0.03 


0.04 


0.03 


0.03 


0.04 


0.04 


0.02 


0.03 


0.02 


0.03 

RMSE 


0.04 


0.06 


0.02 


0.04 


0.04 


0.04 


0.04 


0.05 


0.04 


0.05 


0.06 


0.05 


0.03 


0.05 


0.03 


0.04 


The results in Tab. 4 also show the goodness of the complex covariance models for ocean currents 
predictions for each time instant (hour), over a finer grid (with lag spacing along the vertical and 
horizontal direction equal to 2 km), starting from the coarse grid (with lag spacing along the vertical 
and horizontal direction equal to 6 km). 


Table 4. Some statistics on predictions for hours 00:00, 03:00, 06:00, 09:00, 12:00, 15:00, 18:00, 


21:00 over a (2 x2 km cell size)-grid 



hour 00:00 

hour 03:00 

hour 06:00 

hour 09:00 

hour 12:00 

hour 15:00 

hour 18:00 

hour 21:00 

Predictions 

statistics 

X 

True 

Est. 

Y 

True 

Est. 

X 

True 

Est. 

Y 

True 

Est. 

X 

True 

Est. 

Y 

True 

Est. 

X 

True 

Est. 

Y 

True 

Est. 

X 

True 

Est. 

Y 

True 

Est. 

X 

True 

Est. 

Y 

True 

Est. 

X 

True 

Est. 

Y 

True 

Est. 

X 

True 

Est. 

Y 

True 

Est. 

Num. of data 

161 

1225 

161 

1225 

169 

1225 

169 

1225 

160 

1225 

160 

1225 

167 

1225 

167 

1225 

171 

1225 

171 

1225 

184 

1225 

184 

1225 

179 

1225 

179 

1225 

173 

1225 

173 

1225 

Mean 

-0.10 

-0.09 

0.05 

0.05 

-0.05 

-0.06 

0.09 

0.09 

-0.02 

-0.03 

-0.07 

-0.06 

-0.11 

-0.11 

-0.11 

-0.10 

-0.10 

-0.10 

0.05 

0.06 

0.01 

0.00 

0.18 

0.16 

0.05 

0.05 

0.07 

0.06 

-0.01 

0.01 

-0.01 

0.03 

Std. dev. 

0.12 

0.13 

0.12 

0.14 

0.09 

0.09 

0.08 

0.08 

0.11 

0.12 

0.11 

0.13 

0.14 

0.14 

0.12 

0.12 

0.08 

0.09 

0.12 

0.14 

0.11 

0.11 

0.15 

0.14 

0.12 

0.13 

0.15 

0.16 

0.14 

0.13 

0.20 

0.19 

Std. err. 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.00 

0.02 

0.01 

Minimum 

-0.37 

-0.83 

-0.41 

-0.84 

-0.27 

-0.29 

-0.16 

-0.17 

-0.27 

-0.52 

-0.40 

-0.60 

-0.45 

-0.51 

-0.57 

-0.57 

-0.27 

-0.52 

-0.30 

-0.67 

-0.28 

-0.32 

-0.24 

-0.23 

-0.22 

-0.29 

-0.45 

-0.45 

-0.38 

-0.35 

-0.58 

-0.52 

Maximum 

0.18 

0.52 

0.31 

0.65 

0.22 

0.24 

0.43 

0.44 

0.29 

0.29 

0.13 

0.31 

0.16 

0.18 

0.30 

0.36 

0.08 

0.37 

0.29 

0.52 

0.38 

0.34 

0.55 

0.55 

0.34 

0.34 

0.36 

0.37 

0.26 

0.28 

0.32 

0.38 
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Abstract 

The paper aims to illustrate the importance and convenience of variogram-based exploratory and 
prediction techniques to perform a complete analysis of multiple correlated time series. In 
particular, a thorough case study on a multivariate data set concerning the time series of PM 10 
daily concentrations (principal variable), daily levels of Temperature and Wind Speed (secondary 
or auxiliary variables) recorded at a monitoring station located in an area with high risk of 
pollution, is faced through the following steps: a) identification of trends and periodicity which 
characterize each time series, b) deseasonalization of the analyzed time series, c) fitting a temporal 
linear coregionalization model (T-LCM) to the multiple correlated residuals, d) predictions of the 
PM 10 concentrations at some time points after the last available observation, by using the auxiliary 
information coming from the secondary variables. As regards the computational aspects, a new 
version of the GSLib cokriging routine (Deutsch and Joumel, 1997) has been implemented for 
prediction purposes. The proposed routine allows the use of the T-LCM fitted to the observed time 
series and the setting of search neighborhoods in the time domain under study. 

Keywords: linear coregionalization model, temporal cross-variogram, temporal cokriging 

1 Introduction 

For the analysis of time series, Box-Jenkins methodology (Box and Jenkins, 1976) is commonly 
applied in order to detect the most suitable model which reasonable might describe the temporal 
evolution of the process under study. The fitted model is subsequently used for prediction 
purposes. On the basis of the Box-Jenkins approach, the auto-correlation and the partial auto¬ 
correlation functions (ACF and PACF, respectively), as well as the cross-correlation functions 
(CCF), computed for the observed time series have a crucial role in the modeling selection, indeed 
through the visual inspection of the sample ACF, PACF and CCF, the most appropriate model for 
the process under study can be identified. 
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In the multivariate context, several approaches have been proposed in order to model the joint 
relationships between multiple time series. Among the different types of models developed 
(Reinsel, 2003), the most common are VAR (Vector Auto-Regressive), ARMAX or ARIMAX 
(Auto-Regressive/Moving-Average or Auto-Regressive/Integreted/ 

Moving-Average models in the presence of exogenous variables), the models based on a 
transferring function and the co-integrated models (mainly used in the economic field). 

However, for the analysis of multiple correlated time series, multivariate geostatistics could also 
be a very useful approach, nevertheless it is widely applied to analyze, through the matrix 
variogram, the spatial direct and cross correlation which characterize the variables of interest and 
to predict the corresponding spatial phenomena. 

In this paper the use of variogram-based multivariate geostatistical techniques have been enlarged 
to analyze multiple time series, in order to identify trends and periodicity exhibited by data, 
describe the regularity of temporal data, model the temporal evolution of the variables and make 
temporal predictions for the primary variable using the auxiliary information coming from the 
secondary available variables. The computational aspects have been tackled by implemented a 
new version of the GSLib cokriging routine (Deutsch and Joumel, 1997) which allows the analyst 
to use the fitted T-LCM in the cokriging system and define appropriate temporal search 
neighborhoods for prediction purposes. 

2 Variogram-based modeling and prediction techniques for multiple time series 

In time series analysis, the measurements of p > 2 correlated variables, at different time points or 
intervals, can be considered as a finite realization of a real-valued Multivariate Random Process 
(MRP) (Z(t), tefci}, with 


Z(t) = [Z 1 {t),Z 2 {f) . Z p (t)] T . (1) 

Under second-order stationarity, the mean vector of Z exists and does not depend on t, and the 
(p x p) variogram matrix T defined for two MRP, Z(t) and Z(t'), exists and depends on the 
temporal separation h, i.e.: 

r[z(t),z(t')] = E{[z(t) - Z(t')][z(t) - Z(t')] r ) = r(/i) = [ YlJ m> 

where h = (t — t r ), and Yij(h), i,j = 1,2, ...,p, are the cross-variogram between the random 
variables Z, (t) and Zj(t + h), when i ^ j, and the direct variogram of the i-th random variable, 
when i = j. 
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In the multivariate context, the temporal variogram matrix computed for the p > 2 time series 
under study can be modelled through the most used model in the spatial multivariate analysis, 
namely the Linear Coregionalization Model (LCM) (Chiles and Delfiner, 1999; Wackemagel, 
2003; Goovaerts, 1997; Distefano et al., 2015). 

In this case, a temporal LCM (T-LCM) can be developed as follows: 

L 

r(/i) = ^ B igi (h), (2) 

1=1 


where B t = [b\j], i,j = 1, ... ,p, are (p x p) positive-definite matrices and giQi), l = 1,2,..., L, are 
basic temporal variograms identified at L > 2 variability scales. 

In the structural analysis, before modeling the temporal correlation described by the direct and 
cross-variograms, their estimation from the available data are required. The following classical 
estimators are often used for the direct and cross temporal variograms, respectively: 

fe(r) = 2SrtiZ m+v-zvr (3) 

1 N t (r) 


Yijir ) = l/(2|N y (r)|) ^ [(Z*(t + h) - Z t (t )) • C Zj(t + h) - Z ; (t))], 

Nij(r) 


(4) 


where 

• r is the temporal lag, 

• iV;(r) = [t,t + h E H it i = 1, ...,p, such that \r — h\ < 5}, and |/V,-(r)| is the cardinality of 
this set, 

• blij (r) = {t,t + h E (Hi n = 1,2, ...,p, i ^ j such that \r — h\ < <5), and |Al i7 -(/i)| is 

its cardinality, 

• S is the tolerance, 

• Hi is the set of data at different time points (not necessarily equally-spaced) of the i-th time 
series, i = 1, ...,p. 

As pointed out in De Iaco et al. (2013), the variogram, widely used in the geostatistical context, 

could be efficiently applied in time series analysis (Gevers, 1985; Haslett, 1997), since it can 
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describe a wider class of stochastic processes (namely, the class of intrinsic stochastic processes, 
which includes the class of second-order stationary stochastic processes) and also the variogram 
estimation does not require the knowledge of the expected value of the associated stochastic 
process. Moreover, the variogram is a useful tool to identify trend and periodicity exhibited by 
data, as it will be highlighted in the case study. Finally, this mesaure of the direct and cross 
temporal correlation among the time series under study, can be used for prediction purposes. 
Indeed, the estimation of the unknown value of the stochastic process (Z(t), t E T], using the data 
observed in the past (extrapolation mode), or the data observed before and after the time point t 
(interpolation mode), can be easily supported by multivariate geostatistical tools, as described in 
the following. 

For a second-order stationary MRP Z, a linear prediction of the time series under study at an 
unsampled time point t E T, can be obtained by using the well-known cokriging predictor 
(Matheron, 1963). In this case, the temporal cokriging formulation can be given as follows: 

N 

Z(t) = ^ A«(t)Z(t a ), (5) 

a-1 

where t a E T, a = 1,2, ...,N, are the sampled points and A a (t), a = 1,2,..., N, are the (p x p) 
matrices of the weights which are determined so that the predictor (5) is unbiased and efficient 
(Joumel and Huijbregts, 1981): 


r(ti-ti) • 

• r(t 1 — t N ) 

/ 

-Ai(t)- 


t ih-ty 

r(t N — g) 

/ 

r(t w — t N ) 

/ 

i 

0 

A w (t) 

£ 

= 

r(tjv — t) 

/ 


where p is the (p x p) matrix of Lagrange multipliers. 

If T is conditionally strictly negative definite, then the above system presents one and only one 
solution. 

The ordinary cokriging (Matheron, 1963) requires only the knowledge of the model for the matrix 
variogram and it is used when the expected value of the process is constant and unknown. Since 
the cokriging system can be expressed in terms of the variogram, as in (6), the cokriging predictor 
can be used even when the stochastic process under study satisfies the intrinsic hypothesis. 
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3 A case study 

The potentiality of geo statistical techniques, and especially the role of variogram, to solve 
estimation and prediction problems in time series analysis is pointed out in the following case 
study. 

Two atmospheric variables, i.e. daily Temperature, expressed in °C, and daily Wind Speed 
(m/ sec), as well as particulate matter daily concentrations of aero-dynamic diameter less than 10 
micrometer (PM 10 ), expressed in /rg/m 3 , and measured at one of the monitoring stations of 
Brindisi district (Southern Italy) during the period 2010-2013, have been analyzed using 
multivariate geostatistical techniques. The survey station, called "Torchiarolo" (Fig. 1) belongs to 
the environmental monitoring network of the Apulian Protection Agency. This station is strictly 
close to an industrial site, i.e. the thermoelectric power station Enel-Federico II, located in Cerano 
(Brindisi district). Because of the presence of this big firm, all the surrounding area is considered 
being at high risk of air pollution, especially during the winter and during long period of low 
rainfall. The pollutant under study is strongly influenced by meteorological conditions. In 
particular, the horizontal transport, dispersion and re-suspension of PM 10 are mainly determined 
by wind speed: low values of this meteorological variable are related to high PM 10 concentrations 
(Harrison et al., 1997; Sayegh et al., 2014). Moreover, temperature is considered as one of the 
strongest predictors of PM 10 concentrations. High values of this air pollutant are measured in 
winter, specially when the difference between maximum and minimum daily temperature is large 
(Perez et al., 2002); this difference represents an important meteorological factor for the 
predictions of daily PM 10 concentrations. 



Fig. 1 . Torchiarolo survey station of PM 10 concentrations and meteorological data. 


In the following sections, the advantages and the flexibility of the multivariate geostatistical 
techniques to analyze the times series under study will be pointed out during a) the structural 
analysis, b) the modeling procedure of the temporal direct and cross-correlation of the variables, 
c) the prediction of the variable of interest (PM 10 ) at time points after the last available data. 


47 





Extended Abstracts of GeoENV2018 - Belfast, Northern Ireland 


4-5 July 2018 


3.1 Exploratory data analysis 

In order to assess the statistical properties of the analyzed variables (PM 10 , Temperature and Wind 
Speed) exploratory data analysis has been performed. 

The time series of the variables under study are illustrated in Figg. 2-a) and -b). 

In particular, during the analyzed period (2010-2013) monthly average Temperature presents a 
periodic behavior (Fig. 2-a)). Moreover Temperature and Wind Speed are characterized by 
opposite seasonal behaviors: in winter time, Temperature decreases, while Wind Speed increases; 
on the other hand, in summer time Temperature increases and Wind Speed decreases (Fig. 2-c)). 

The time plot in Fig. 2-b) shows that the time series of PM 10 daily concentrations presents an 
annual periodicity at 12 months. It is important to highlight that in order to protect the human 
health, a national law fixed a limit value for the PM 10 daily concentrations which cannot be greater 
than 50 gg/m? for more than 35 times per year. In this case study, over the four-year span (from 
2010 to 2013), the PM 10 daily values exceeded the threshold 243 times; in addiction, more than 
35 exceedances per year have been measured by the monitoring station. 



- Temperature C°Q -- Wind Speed (m/s) - PM lo (pg,ti^ 


Fig. 2. Time plot of monthly average for a) Temperature and Wind Speed, b) PM 10 , and box plot 
of c) Temperature and Wind Speed and d) PM 10 daily concentrations, grouped by month 
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Moreover, this variable is characterized by a seasonal component. During summer time, particle 
pollution shows lower levels compared to those recorded during winter time; in particular, in this 
season, PM 10 does not exceed the limit value fixed by the national law, as illustrated in Fig. 2-d). 
On the other hand, in winter time changes in the lower layer of the troposphere determine PM 10 
stagnation. Hence, high levels of this pollutant are measured. 

3.2 Structural analysis and modeling 

The available data have been considered as a realization of a second-order stationary MRP Z(t) = 
\Z 1 (t), Z 2 (t), Z 3 (t)] T , t E T Q R, whose components are associated with, respectively, the PM 10 
daily concentrations (Z 1 ), the Temperature (Z 2 ) and the Wind Speed (Z 3 ) daily averages. 

Since the time series under study are characterized by periodicity, as highlighted by the exploratory 
data analysis, these periodic components could be factored out. Moving average and monthly 
averages techniques have been applied in order to obtain PM 10 , Temperature and Wind Speed 
residuals. Successively, the structural analysis has been developed by considering the residuals for 
the primary and secondary variables. 

After computed the sample direct and cross temporal variograms of the residuals, two different 
scales of temporal variability have been detected through the visual inspection of the sample 
variograms. Hence, the following T-LCM has been fitted to the sample matrix variogram: 

T (K) = B 1 gi (h) + B 2 g 2 (h), (7) 

where g x is the short-scale temporal component described by the exponential model (Cressie, 
1993) with unit sill and range equal to 30 days and g 2 is the long-scale temporal component 
described by an exponential model with unit sill and range equal to 365 days. The positive-definite 
matrices B 0 1 = 1,2, are, respectively: 



" 230 

3.07 

-3.4 ' 


\ 30 

1.2 

-1.1 ' 

B x = 

3.07 

-3.4 

4.9 

-0.025 

-0.025 

0.58 

, b 2 = 

1 

k) 

0.37 

-0.016 

-0.016 

0.073 


Hence, the sample temporal variograms (Fig. 3) of the residuals have been modelled as follows: 

Kii(h) = 230g 1 (h) + 30g 2 (h), 
y 22 (h) = 4.9 gi(h) + 0.37 g 2 Qi), 
y 33 {h) = O.SQg^K) + 0.073g 2 (ft). 
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On the other hand, the following models have been fitted to the sample temporal cross-variograms 
of the residuals (Fig. 4) : 


Y 12 (h) = 3.07 g 1 (h) + 1.20 2 (h), 


Yi3(h) = -3.40!(h) - l.l0 2 (ft). 


y 23 (h) = —0.0250!(h) - O.O160 2 (h). 

It is worth noting that the sample temporal variograms of the observed time series, illustrated in 
Fig. 3, show the periodic behaviors which characterize the same time series, as already found out 
during the exploratory data analysis. 

At this point, it is convenient to check if the fitted model (7) can be considered suitable to make 
predictions of the primary variable, thus a validation procedure has been properly performed to the 
same model. 
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Fig. 3. Sample temporal direct variograms of a) PM 10 daily concentrations and residuals, b) 
Temperature daily averages and residuals, c) Wind Speed daily averages and residuals, and fitted 
models. 
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Fig. 4. Sample temporal cross-variograms of a) PM 10 and Temperature residuals, b) PM 10 and 
Wind Speed residuals, c) Temperature and Wind Speed residuals, and fitted models. 
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3.3 Model validation and temporal prediction 

The goodness of the fitted T-LCM (7) has been evaluated through cross-validation, which allows 
the estimation for PM 10 residuals at all data points. It is important to point out that the T-LCM (7) 
has been validated using a modified version of the GSLib program "Cokb3D" (Deutsch and 
Joumel, 1997), named "T-Cok". This program has been developed in order to make temporal 
predictions of the time series of interest by using a) the auxiliary variables, b) the T-LCM fitted to 
the available data, c) a properly defined neighborhood, i.e. a subset of time data which can be 
considered in the cokriging system. 

Hence the cross-validation has been performed and the correlation between PM 10 residuals and 
estimated ones has been measured. The high values of the linear correlation coefficient (0.780) has 
confirmed the goodness of the fitted T-LCM. Therefore, the model (7) can be used in order to 
obtain predictions for the variable under study in time points after the last available data. In 
particular, PM 10 residuals have been predicted for six time points after the last available data (the 
31st of December 2013), hence cokriging temporal predictions have been computed from the 1st 
to the 6th of January 2014, by using, the new GSLib routine "T-Cok". The deseasonalized PM 10 
observations, the residuals of the auxiliary variables (Temperature and Wind Speed) and the T- 
LCM (7) are the input information for the "T-Cok". Then, the diurnal component has been added 
to the predicted PM 10 residuals in order to obtain predictions of PM 10 daily concentrations. 

In Fig. 5, the time series of PM 10 daily concentrations measured from the 9th of December 2013 
to the 6th of January 2014 is shown together with the predicted PM 10 values for the period ranging 
from the 1st to the 6th of January 2014. 

It is worth noting that, the behavior of the particle pollution predicted values is quite similar to the 
true PM 10 daily concentrations; moreover as it is for the true values recorded by the station in the 
period 1-6 January 2014, some predicted values are greater than the limit value fixed by the 
national law (50 / ug/m 3 ) and it can represent a hazardous condition for air quality and human 
health. 
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Fig. 5. Time plot of PM 10 predicted values and PM 10 daily concentrations ( ng/m 3 ), from the 1 st 
to the 6th of January 2014 

4. Conclusions 

In the present paper, the time series of PM 10 daily concentrations and two meteorological 
variables (Temperature and Wind Speed) highly correlated with the pollutant under study, have 
been analyzed through multivariate geostatistical techniques. The importance and the advantages 
of using variogram-based procedures have been highlighted during both modeling and prediction 
stages. 

As further developments, the results obtained by considering PM 10 time series recorded at 
different types of stations (industrialized, urban or rural stations) belonging to the same 
environmental monitoring network, could be compared in order to identify possible differences in 
temporal scales of variability. 

The scientific community should consider the flexibility of the geostatistical tools, such as the 
variogram, for the analysis of time series and more theoretical and computational efforts should 
be made in order to extend the variogram-based techniques in the time domain. 
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Abstract 

Radon is the main source of ionising radiation dose received by the general population, being the 
second most important cause of lung cancer after smoking. Indoor radon concentration depends 
on several factors, including soil-gas radon concentration (i.e. the main indoor radon source) and 
soil properties related to radon transport in the environment (e.g. permeability, texture). In this 
study we analyse the applicability of topsoil geochemical data from the Tellus project for radon 
mapping, evaluating how these data may be used to improve the spatial resolution of the Irish 
national radon risk map, once the Tellus project has completed coverage for the island of Ireland. 

A multivariate analysis of topsoil geochemistry was carried out to cluster data in similar 
geochemical groups, and subsequently an analysis of these groupings with indoor radon 
concentration was performed. The first step in data interpretation was to analyse non-detected 
values, following which a Multivariate Outliers Detection for Compositional Data using robust 
Mahalanobis distances was applied. Finally, cluster analysis of the variables and the observation 
were performed. The analysis has not identified a clear correlation between uranium in topsoil and 
indoor radon concentration. However, correlations between the different cluster groups and the 
Quaternary geology were recognised and these may help to improve the Quaternary map. 

1. Introduction 

Topsoil geochemistry accounts for up to 40% of observed indoor radon concentration variance, 
with geology and airborne radiometric data for another 7% of the total variance (Ferreira et al. 
2016). This implies topsoil geochemistry could be an important factor to consider in radon 
mapping. This study was therefore designed to analyse the applicability of topsoil geochemistry 
data obtained in the Tellus project and evaluate how these data may be used to improve the 
resolution of the Irish indoor radon risk map when the Tellus project covers the whole of the island 
of Ireland. 

2. Topsoil geochemistry data 
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In total 3,540 soil samples were acquired for the Tellus Border project, following a systematic 
sampling design. In addition to pH analysis and Loss-On-Ignition at 450°C (LOI%), the 
concentrations of 52 analytes were measured by ICP (data are freely available on the Tellus 
website; www.tellus.iel . The summary statistics of Tellus Border results are presented in Table 1. 


Table 1. Summary of the Tellus Border topsoil geochemical data 


Variable 

Mean 

Median 

SD 

Min. 

Max. 

Qi 

Q3 

U 

1.29 

1.00 

2.74 

0.03 

108.00 

0.54 

1.42 

Th 

2.57 

2.50 

1.93 

0.03 

24.00 

0.90 

3.80 

K 

0.12 

0.11 

0.07 

0.01 

0.56 

0.06 

0.17 

Al 

1.06 

1.01 

0.74 

0.01 

3.62 

0.40 

1.64 

Fe 

1.94 

1.91 

1.56 

0.01 

15.00 

0.64 

2.84 

Ca 

0.45 

0.23 

1.16 

0.01 

15.00 

0.14 

0.37 

Na 

0.02 

0.02 

0.02 

0.00 

0.30 

0.01 

0.03 

Mg 

0.31 

0.18 

0.29 

0.01 

3.23 

0.11 

0.46 

s 

0.15 

0.08 

0.14 

0.01 

1.01 

0.06 

0.24 

p 

806.89 

750.00 

409.57 

48.11 

3850.00 

502.50 

1040.00 

Ba 

72.62 

56.60 

91.77 

3.85 

3464.50 

27.40 

93.40 

Cr 

22.29 

16.25 

21.44 

0.37 

372.41 

5.46 

34.24 

Cu 

22.73 

18.46 

23.74 

0.37 

413.14 

5.79 

31.72 

Li 

11.52 

9.00 

9.48 

0.43 

44.00 

3.00 

19.00 

Mn 

460.07 

261.90 

820.37 

1.77 

8351.50 

83.70 

551.18 

Ni 

16.51 

11.90 

15.62 

0.44 

181.00 

3.30 

25.90 

Sr 

28.83 

16.40 

77.51 

0.43 

1900.00 

11.23 

26.80 

V 

27.14 

25.70 

18.51 

0.93 

176.10 

11.00 

39.40 

Zn 

54.30 

46.00 

49.92 

0.97 

1030.00 

24.00 

74.00 

Zr 

1.72 

1.40 

1.35 

0.14 

22.10 

0.70 

2.40 

Ag 

0.08 

0.07 

0.08 

0.00 

2.18 

0.05 

0.10 

As 

6.50 

4.80 

19.12 

0.14 

960.80 

2.10 

7.60 

Be 

0.47 

0.40 

0.37 

0.01 

3.60 

0.20 

0.70 

Bi 

0.15 

0.13 

0.11 

0.01 

3.55 

0.10 

0.18 

Cd 

0.41 

0.29 

0.52 

0.02 

14.80 

0.19 

0.45 

Ce 

32.75 

29.95 

27.10 

0.44 

612.00 

11.90 

49.50 

Co 

6.76 

5.60 

6.83 

0.20 

185.00 

1.30 

10.70 

Cs 

0.64 

0.63 

0.51 

0.01 

8.21 

0.28 

0.90 

Ga 

4.15 

3.80 

2.86 

0.09 

18.00 

1.70 

6.50 

Hg 

0.10 

0.09 

0.07 

0.00 

1.66 

0.06 

0.13 

La 

14.20 

12.60 

11.40 

0.10 

196.00 

5.40 

21.40 

Lu 

0.07 

0.06 

0.07 

0.00 

1.17 

0.02 

0.09 

Mo 

1.38 

0.88 

2.23 

0.04 

39.10 

0.60 

1.42 

Nb 

0.40 

0.25 

0.43 

0.03 

4.49 

0.15 

0.46 

Pb 

25.86 

20.25 

36.48 

0.17 

1116.40 

14.90 

27.80 

Rb 

12.34 

12.10 

8.90 

0.14 

45.60 

3.80 

19.58 

Sb 

0.35 

0.27 

0.36 

0.02 

10.50 

0.17 

0.43 

Sc 

2.78 

2.50 

2.14 

0.09 

19.40 

0.90 

4.20 

Sn 

0.95 

0.70 

1.62 

0.05 

55.50 

0.50 

1.00 

Tb 

0.28 

0.26 

0.23 

0.00 

3.97 

0.10 

0.39 

T1 

0.16 

0.14 

0.16 

0.01 

4.22 

0.07 

0.20 

Y 

6.79 

5.70 

7.09 

0.12 

163.00 

2.35 

8.85 

Yb 

0.47 

0.40 

0.45 

0.01 

8.30 

0.20 

0.60 


Al, Ca, Fe, Mg, Na, S concentrations are in %, all the others values are in mg kg 1 


57 









Extended Abstracts of GeoENV2018 - Belfast, Northern Ireland 


4-5 July 2018 


3. Data preparation 

The first step in data interpretation was to analyse non-detected values. The original dataset 
(download from Tellus webpage) assigns a value equal to one-half of the detection limit (0.5 DL) 
to the points where the concentration was lower than the detection limit. However, to produce 
more realistic values a multivariate method (i.e. alr-EM algorithm, Palarea-Albaladejo and Martin- 
Femandez 2013) was applied (Table 1). For several analytes the proportion of non-detected values 
was very high (e.g. Table 2 and Table 3). Although data imputation is also possible in the elements 
with more than 65% of non-detected values (Table 3), these elements were not considered in this 
study. 


Table 2. Elements with 10-40% of the reported data lower than the D.L. 



Cr 

Li 

Na 

Zr 

Be 

Cs 

Lu 

Tb 

Tl 

Yb 

D.L. 

1 

1 

0.01 

0.5 

0.1 

0.05 

0.01 

0.02 

0.02 

0.1 

N° < D.L. 

459 

621 

1300 

502 

830 

429 

667 

415 

375 

804 

N° > D.L. 

2981 

2819 

2140 

2938 

2610 

3011 

2773 

3025 

3065 

2636 

Percentage 

13 

18 

38 

15 

24 

13 

19 

12 

11 

23 


Table 3. Elements with more than 65% of the reported data lower than the D.L. _ 

B Ti Ge Hf In Se Ta Te W 


D.L. 

10 

0.01 

0.1 

0.05 

0.02 

1 

0.05 

0.05 

0.1 

N° < D.L. 

3411 

2931 

3411 

2480 

2306 

2560 

3163 

3207 

3332 

N° > D.L. 

29 

509 

29 

960 

1134 

880 

277 

233 

108 

Percentage 

99 

85 

99 

72 

67 

74 

92 

93 

97 


Data imputation is useful to analyse correlations between analytes. However, to carry out statistical 
inferences (i.e. estimate the mean or median, with the confidence interval) it is also required to use 
some methods which take into account the uncertainty generated by the presence of non-detected 
values (Palarea-Albaladejo and Martin-Femandez 2013), for example bootstrap resampling. 

After the missing-data imputation, a Multivariate Outliers Detection for Compositional Data using 
robust Mahalanobis distances was applied to the dataset (function: mvoutlier.CoDa, R-package: 
mvoutlier; P. Filzmoser and Gschwandtner 2015; Peter Filzmoser, Hron, and Reimann 2012). The 
spatial distribution of detected outliers is plotted in Fig la, where red crosses are high extreme 
outliers, blue crosses low extreme outliers, and green and light blue crosses are observations more 
close to the origin (Filzmoser et al. 2012). Although some insolate points are found in the area, 
outliers are detected principally in the north section, south-west and in the Cooley Peninsula 
(circles). Thus, the detected outliers seem to be related more with some physical effect (e.g. 
lithological) than with error in the sample process (e.g. soil sample and laboratory analysis). 
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Fig 1. a) Spatial distribution of outliers, and b) compositional biplot of the first two principal 
components. 

The first principal component (PCI), dominated by S, Na and Sr, accounts for about 55% of the 
total variability (Fig lb). To analyse the influence of the high percentage of non-detected values 
for some elements (i.e. up to 38 % for Na, Table 2), the data were reinterpreted with only elements 
that have less than 10% non-detected values. Outliers in this case follow the same trend (not shown 
in this paper), and are grouped in specific areas. Sulphur is again the dominant element in the 
principal component that accounts for more variability (up to 58 %, PC 1). This result suggests that 
the possible outliers are related to soil properties, probably to its concentration in organic matter 
since S is typically related to the amount of organic matter (O.M.) present in soils. The similar 
spatial distributions of outliers (Fig la). Sulphur in topsoil (Fig. 2a), and the content of organic 
matter (O.M) in soil (Fig. 2b) support this hypothesis. 



Fig. 2: a) Spatial distribution of S [%] in topsoil, and b) soil classification based on Organic Matter 
content (INDOT Criteria and supposed SOM[%] = LOI[%], Huang et al. 2009) 
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The importance of peat soils in relation to radon risk assessment is that their permeability is 
normally low, reducing the possibility of radon transport into the house. Another effect to be 
evaluated is that as radon has high solubility in organic phases (Schubert et al. 2002, 2005, 2007; 
Garcia-Gonzalez et al. 2008; De Simone et al. 2015), the organic matter of the peat soil may act 
therefore as a trap for radon, reducing its concentration in the soil-gas and its mobility in the 
environment. These combined effects may cause a lower radon risk in peats compared to other 
Quaternary deposits (Elio et al. 2017). 

4. Cluster analysis 

Cluster analysis was carried out using only elements with less than 10% of values below the 
detection limit, in order to avoid possible errors caused by missed values (e.g. Tempi et al., 2008). 
A variable clustering analysis helps to interpret the geological processes occurring in an area and 
therefore allows selection of variables that better fit with the objectives of the study, reducing the 
dimensionality of the dataset (Tempi et al. 2008). We performed the analysis with the data log- 
transformed and standardised (Fig. 3), and using the Ward clustering criteria and the Euclidean 
distance (Murtagh and Legendre 2014). For example, in Fig. 3 it is evident that S and LOI have 
similar behaviour, probably due to the relationship between S, LOI and organic matter. 



Fig. 3. Cluster analysis of the variables (log-transformed, standardised, variables with less than 
10% of non-detected, pH and LOI) 

Cluster analysis is also useful to evaluate if there are samples with similar geochemical 
composition, and if these samples are clustering in a specific geographic area due to, for example, 
different geological properties (e.g. occurrence of Quaternary sediments). The cluster analysis in 
this case (Fig.4 and Fig. 5) was carried out after an isometric log-ratio transformation and 
standardisation, and both Ward clustering criteria and Euclidean distance were applied. Finally a 
Linear Discriminant analysis (LDA) was used to re-assign the data in six groups (Fig. 6a: Group 
1: 338 samples. Group 2: 440 samples, Group 3: 1144 samples, Group 4: 56 samples ,Group 5: 
790 samples, and Group 6: 672 samples). 
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Fig.4. Cluster analysis of the observations (isometric log-ratio transformation, standardised; 
variables with less than 10% of non-detected values; without pH and LOI) 



Fig. 5. Clusters, with 2, 3 and 6 groups 



Clusters 

■ G1 

■ G2 

■ G3 

■ G4 
G5 

■ G6 



Sediment_500K 

■■Peat 

Lac sediments 
Mluvium 
"Wind blown Sd 
■Mad & Ed 
pe deposits 
■Gf/GI Sd & Grv 
■Gm Sed 
3Ls till 

■Sst & Sh till 
■Sst till 

■Devonian Sst till 
■Grt till 

■Meta rocks till 
Bedrock 


Fig. 6. a) Cluster observations after re-assignment by LDA and grouping in polygons (Thiessen 
polygons), and b) part of the all-Ireland Quaternary geology map scale 1:500k (GSI; Ed: estuarine 
deposits, Grt: granites, Grv: gravel, Gf: glaciofluvial, Gl: glaciolacustrine, Gm: glaciomarine, Lac: 
lacustrine, Ls: limestones, Mad: Marine deposits, Meta: metamorphic, Sd: sand, Sed: sediments, 
Sst: sandstones, Sh: shales) 
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A spatial aggrupation of the six geochemical clusters is clearly appreciated (Fig. 6a), which is 
correlated to Quaternary geology (Fig. 6b). The median values of pH, LOI, and the 43 analystes 
with less than 10% of non-detected of each cluster are shown in Table 4. 


Table 4. Median values of the six different cluster groups 



Group 1 

Group 2 

Group 3 

Group 4 

Group 5 

Group 6 

pH 

3.40 

3.36 

4.58 

7.13 

4.93 

4.50 

LOI 

87.20 

96.70 

15.10 

6.29 

15.30 

15.55 

U 

0.42 

0.14 

1.23 

0.27 

1.09 

1.21 

Th 

0.50 

0.09 

2.40 

0.80 

3.50 

3.90 

K 

0.05 

0.04 

0.10 

0.06 

0.17 

0.14 

Al 

0.24 

0.08 

0.97 

0.17 

1.93 

1.15 

Fe 

0.49 

0.13 

1.88 

0.33 

2.86 

2.11 

Ca 

0.15 

0.18 

0.26 

6.54 

0.31 

0.18 

Na 

0.03 

0.04 

0.01 

0.08 

0.02 

0.02 

Mg 

0.10 

0.15 

0.12 

0.22 

0.61 

0.31 

S 

0.34 

0.34 

0.07 

0.07 

0.07 

0.08 

P 

540.00 

380.00 

800.00 

295.00 

1030.00 

680.00 

Ba 

21.80 

15.20 

56.60 

19.90 

102.80 

58.50 

Cr 

1.86 

0.76 

17.45 

1.86 

47.43 

17.45 

Cu 

4.17 

3.05 

21.00 

1.53 

32.00 

18.67 

Li 

0.79 

0.71 

9.00 

3.00 

24.00 

11.00 

Mn 

58.00 

53.00 

340.00 

120.00 

500.00 

239.00 

Ni 

2.40 

1.80 

14.20 

2.60 

34.95 

9.10 

Sr 

19.90 

29.85 

14.90 

393.50 

12.50 

17.65 

V 

6.70 

4.60 

25.70 

8.90 

42.50 

26.70 

Zn 

18.00 

25.00 

51.00 

8.50 

79.00 

38.50 

Zr 

1.00 

0.60 

1.30 

0.39 

2.50 

1.40 

Ag 

0.06 

0.04 

0.07 

0.01 

0.07 

0.09 

As 

2.10 

1.10 

5.80 

3.00 

6.70 

3.90 

Be 

0.08 

0.05 

0.60 

0.08 

0.70 

0.40 

Bi 

0.11 

0.12 

0.12 

0.03 

0.17 

0.14 

Cd 

0.23 

0.29 

0.47 

0.08 

0.29 

0.17 

Ce 

7.16 

1.61 

25.60 

9.92 

49.70 

46.70 

Co 

0.80 

0.40 

6.10 

0.90 

12.60 

5.50 

Cs 

0.11 

0.04 

0.73 

0.16 

0.88 

0.56 

Ga 

0.90 

0.40 

3.40 

0.70 

7.50 

5.10 

Hg 

0.13 

0.12 

0.08 

0.02 

0.09 

0.07 

La 

3.50 

0.80 

10.50 

5.20 

21.40 

20.65 

Lu 

0.01 

0.01 

0.07 

0.03 

0.07 

0.08 

Mo 

0.82 

0.63 

1.26 

0.16 

0.87 

0.78 

Nb 

0.30 

0.17 

0.16 

0.21 

0.34 

0.50 

Pb 

20.60 

22.00 

19.20 

1.80 

23.80 

17.05 

Rb 

1.40 

0.60 

12.40 

3.60 

21.10 

12.60 

Sb 

0.33 

0.46 

0.22 

0.06 

0.38 

0.18 

Sc 

0.60 

0.30 

3.00 

0.60 

4.30 

2.30 

Sn 

0.60 

0.60 

0.70 

0.17 

0.90 

0.70 

Tb 

0.05 

0.02 

0.27 

0.10 

0.37 

0.32 

Tl 

0.04 

0.02 

0.18 

0.03 

0.17 

0.14 

Y 

1.27 

0.40 

7.16 

2.74 

7.16 

6.77 

Yb 

0.09 

0.04 

0.50 

0.20 

0.50 

0.50 


Al, Ca, Fe, K, Mg, Na, S concentrations are in %, all the others values are in mg kg -1 
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Although further investigations are needed in order to properly identify the provenance of the 
different deposits and the geological processes that took place in the area, some differences are 
apparent between the clusters (Table 4). For example, Group 1 and Group 2 have the highest values 
of LOI and S, and the lowest pH, which may be associated to peat soils. G4 is the cluster with the 
highest pH and the highest concentration in Sr, Ca, and Na, which may be related to a sea spray 
deposition along the west coast (Fig. 6a). 

On the other hand, G6 seems to be correlated with tills derived from metamorphic rocks, with 
relatively high concentration of Al, Fe, and some trace elements (e.g. Th, Ce, La, Y). G3 is 
principally located in the central part of the area, linked to limestone tills and sandstones and shale 
tills. Although further analysis are required to examine if these two deposits have different 
geochemical composition, this cluster is characterized by having relatively high concentrations of 
U, Th, and K, as well as Fe, P, Mn, Cd, Mo, and Y. Finally, G5 is related to the sandstones and 
shale tills of the south-east area, and this cluster has the highest values of Al, Fe, and Mg, and trace 
elements (e.g. Th, K, P, Mn, Ni, etc.). 

The analysis of topsoil geochemistry may help therefore to improve the resolution of the 
Quaternary map, and given the relationship between Quaternary deposits and indoor radon risk 
(Elio et al. 2017), helps indirectly to improve the spatial resolution and accuracy of the radon risk 
map. 

5. Indoor radon and topsoil geochemistry 

A total of 5,755 dwellings were measured by the Environmental Protection Agency of Ireland 
(EPA) in the study area (Fig. 7). Indoor radon follows approximately a log-normal distribution, 
with a geometric mean of 51 Bq m" 3 and a geometric standard deviation of 2.94 Bq m' 3 . The 
minimum value in the area was 0.1 Bq m' 3 (< D.L. of 10 Bq m' 3 ) and the maximum was 5,613 Bq 
m" 3 . The area is mostly classified as Non-High Risk (i.e. probability of having an indoor radon 
concentration higher than the reference level of 200 Bq m' 3 is lower than 10%), although some 
areas in the Cooley Peninsula (SE), Sligo (SW), Cavan (S), and Donegal (Centre) are classified as 
High Risk (Fig. 7). 
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Fig. 7. indoor radon risk map (Elio et al. 2017) and radon measurements (EPA) 


Since radon is generated as an intermediate daughter product in the uranium decay series, it is 
reasonable to expect higher indoor radon concentrations in areas with higher uranium 
concentration in the topsoil. However, results obtained in this study do not support this hypothesis 
and other factors appear to have more influence on indoor radon than the concentration of uranium 
in the soil. In this regard, the different cluster groups showed some differences in soil uranium 
concentration (Fig. 8a) but not in the indoor radon concentration (Fig. 8b). 




Fig. 8. Boxplot diagrams of (a) uranium concentration in topsoil and (b) indoor radon 
concentration, classified by cluster group G1 - G6. 

A Tukey test, which estimates the confidence intervals on the differences between the mean values 
of the clusters (Fig. 9), confirmed differences in uranium concentrations for the different cluster 
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groups. However, differences in indoor radon are not evident. Further investigations are therefore 
required to better understand the relationship between topsoil uranium and indoor radon. 
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Fig. 9. Tukey test of topsoil uranium and indoor radon by cluster groups 
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Abstract 

We propose a methodology for ore body modeling using geological modeling, multiple-point 
geostatistical (MPS) simulation, cluster analysis, inverse transform sampling for physical property 
model and geophysical data. MPS was applied to analyse the three-dimension (3D) structures for 
the mine evaluation using aeromagnetic data. In general, 3D geological models are obtained 
through limited data known by borehole data and geological data. These models are unsuitable for 
expressing spatial underground and contain uncertainties. However, 3D ore models using MPS are 
effective in describing complex structures and irregular ore deposit, and can present stochastic 
models in study areas. In addition, it can be used to evaluate the economic feasibility and suggest 
plans of mine development. The proposed methodology was applied to metal mines in Republic 
of Korea. Geological modeling was carried out to generate reliable training image (TI) taking into 
account azimuth, dip and genesis of ore deposits in the study areas. This TI should reflect the 
geological features and describe the patterns of rock facies to provide a meaningful evaluation. 
The MPS using TI was applied SNESim (Single Normal Equation Simulation) algorithm and 
various probability implementation models were obtained. Probabilistic methods bear the 
characteristics of statistics while providing various realisations for different geological 
phenomena. For this analysis, cluster analysis using Euclidean distance, distance-based 
multidimensional scaling (MDS), radial basis function (RBF) kernel and k-medoids was conducted 
to reduce the errors stemming from uncertainty and comprehend the characteristics of the ore body 
distribution pattern. Geological representative model of each cluster was constructed by 3D 
physical property model by inverse sampling using the magnetic susceptibility of the rock 
laboratory tests. This is forward modeling that provides a magnetic anomaly, and aeromagnetic 
data were used to support the analysis of forward model result. Ore body modeling using MPS 
with aeromagnetic data was able to present the distribution and configuration of the genesis of ore 
body in the study areas. Also, this methodology could be applied to various ore deposits for mine 
evaluation. 
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Introduction 

Multiple-point geostatistics (MPS) was applied for three-dimensional (3D) ore body modeling. 
From ore model results, geophysical data were analyzed and mine evaluation was performed. Ore 
body modeling should be presented as reliable results by geological data. Geological data includes 
borehole data that directly reveal the underground structure, geological map and section presented 
by geologists. However, in realizing the 3D underground structure, it is limited to describe 
complex underground characteristics and the genesis of ore deposit. MPS was applied as a solution 
this problem. As a method for analyzing the ore model of MPS with geophysical data, magnetic 
susceptibility model was constructed through cluster analysis and inverse transform sampling of 
property distribution obtained from rock laboratory tests. This property model can be analyzed 
with aeromagnetic data using forward modeling. The proposed methodology (Hong et al., 2018) 
in this study was applied for evaluate ore body modeling through geological modeling, MPS, 
cluster analysis and inverse transform sampling. 

Methodology 

Fig. 1 shows the workflow of the proposed methodology. In step 1, geological modeling is carried 
out to generate training image (TI) using borehole data, geological map, and geological section. 
For realistic models, the TI should reflect the pattern of ore deposit genesis and geological features 
in the study area. In step 2, the SNESim algorithm developed by Strebelle (2002) is applied using 
the generated TI. The SNESim algorithm is used to predict the multivariate distribution obtained 
from TI, a technique based on the relative regular lines between multiple points to solve the 
problem of two-point geostatistics. The various realizations obtained from applying the SNESim 
algorithm are selected through the cluster analysis of Step 3. In the cluster analysis, the 
dissimilarity of the models is calculated by the Euclidean distance based multidimensional scaling 
(MDS). After that, RBF kernel and K-medoids are used to group similar models. Therefore, Rock 
facies models were obtained as a result of simulation for each cluster. For comparison with 
aeromagnetic data, the magnetic susceptibility model is generated from rock facies model using 
inverse transform sampling of the rock laboratory tests. As a result, the magnetic anomaly of 
simulation results is obtained from the forward modeling result of the magnetic susceptibility 
model. The results of that anomaly are compared and analyzed with existing aeromagnetic data 
and geostatistical models. 
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Fig. 1. Workflow of proposed methodology. In step 1, Geological modeling for TI generation is 
applied. In steps 2 and 3, realization results are classified using SNESim and cluster analysis. In 
Step 4, a 3D magnetic susceptibility model is constructed by inverse transform sampling of rock 
properties tests. Finally, the forward modeling results of the magnetic susceptibility model are 
analyzed for magnetic anomaly of the aeronautical magnetic survey. 


Geological setting and aeromagnetic data of study area 


The study area is the Gagok mine, located in the north-eastern region of South Korea (Fig. 2(a)). 
The ore deposit falls under the category of a skam deposit, formed from the contact area between 
carbonate rock and intrusive igneous rock. In general, the deposit is embedded within bed and 
platy structures, with some local pipe deposits (Choi et al., 2010). The Korea Institute of 
Geoscience and Mineral Resources (KIGAM) (2014) conducted an aeromagnetic survey (Fig. 
2(b)). The obtained data were corrected using the International Geomagnetic Reference Field 
(IGRF) correction method before analysis. 
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Fig. 2. (a) Simplified geological map of the Gagok mine area, (b) Magnetic anomaly map in the 
corresponding area after IGRF correction. The area of MPS is depicted by the dotted box. 
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Implementation of the proposed methodology 
Step 1. Geological modeling 

The geological modeling for TI was selected as a region that well represents the geologic features 
of the ore deposits genesis in the study areas. For reliable modeling, geological modeling was 
performed based on borehole data, geological map and geological section. Rock facies were 
classified on the basis of magnetic susceptibility similar to that of ore body in order to compare 
with the aeromagnetic survey as the final result. This is necessary to confirm whether the influence 
of the anomaly in the aeromagnetic survey is caused by the ore body. Therefore, the category for 
the geological modeling is Category 1 for the target ore body, Category 2 and 3 for the Skam and 
Slate with the similar magnetic susceptibility of the ore body, and Category 4 for the other rocks. 
Fig. 3 shows the modeling results for each rock facies and the block model of TI obtained from 
the geological model. Block model of TI is used to implement the SNESim algorithm. 



Fig. 3. In step 1, Geological modeling was applied to generate TI. The TI was set as a reliable area 
based on borehole data, geological map and section. The 3D geological modeling results for each 
category are (a) Ore body, (b) Skam, (c) Slate and (d) Other rocks. Tabular ore bodies was reflected 
in geological information of the azimuth and dip in the study areas, (e) The block model was used 
for SNEsim of MPS. 

Steps 2 and 3. SNESim and Cluster analysis 

The SNESim was performed using Stanford Geostatistical Modeling Software (SGeMS) (Remy 
et al., 2009), and a total of 100 realizations were obtained. These realizations are generated as 
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independent results, but there are similar models. Cluster analysis was performed as a method to 
identify the classification and geological characteristics among these models. Cluster analysis is 
performed using the proposed method by Honarkhah and Caers (2010). Fig. 4 shows the workflow 
of cluster analysis. To find similarities between models, realizations are represented by a two- 
dimensional (2D) map of the distance-based MDS. To classify these points, RBF kernels and k- 
medoids are used as a way to clustering of mapped realizations. Fig. 5 shows the E-type result of 
each cluster as a result of cluster analysis. Clusters 1-3 can be categorized as realistic models 
because the number of classified models in each cluster is large and the distance of representative 
classified models is close to each other. However, clusters 4 and 5 can be classified as unrealistic 
outliers because of fewer models and distance relatively far from other clusters. 
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Fig. 4. Workflow (Honarkhah and Caers, 2010) for the cluster analysis to classify simulation 
results, (a) Calculation of Euclidean distance between two models for all models, (b) Distance 
matrix of all models, (c) Points (models) mapped in the 2D MDS map. (d) Cluster analysis using 
RBF kernel and k-medoids. As a result of the cluster analysis, the realizations are classified into 
five clusters, and representative models can be identified in the center of each cluster. 
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Cluster 3 


Cluster 5 


Fig. 5. Cluster models that indicate spatial patterns using RBF kernel and k-medoids in a 2D MDS 
map. Each representative point indicates the E-type results of the rock facies model in 2D space. 
Clusters 1-3 are relatively close to each other, while Cluster 4 and 5 is distant. Therefore, Cluster 
4 and 5 may be grouped as an outlier, considering the distance of dissimilarity is greater than the 
other clusters and the number of corresponding models is smaller. 


Step 4. Inverse transform sampling 

To perform forward modeling of the geological model as a result of MPS, a magnetic susceptibility 
model is constructed by inverse transform sampling from the rock laboratory tests. As a result, the 
magnetic anomaly of the simulation result using forward modeling is obtained (Fig. 6). 


72 








Extended Abstracts of GeoENV2018 - Belfast, Northern Ireland 


4-5 July 2018 


Cluster 2 



Cluster 3 Cluster 5 


Fig. 6. The magnetic anomaly of each cluster is obtained by forward modeling from a 3D magnetic 
susceptibility model. The overall tendency indicates low magnetic anomaly from south to north. 
The high magnetic anomaly, which is the influence of the ore body, skam, and slate, shows the 
shape of the anomaly depending on the geological model. 


Conclusions 

The ore body modeling using MPS was used to evaluate the ore body through cluster analysis and 
aeromagnetic data. By conducting MPS using a TI, a tabular ore bodies was created that describes 
geological characteristics. These models were classified through cluster analysis, and uncertainty 
errors could be reduced by excluding the outliers. The magnetic anomaly, which is the result of 
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the proposed methodology, represents the overall tendency of the results of aeromagnetic survey. 
Also, more reliable evaluations of the ore body can be obtained via integrated analysis of 
geophysical and geological data, such as borehole data and geological sections. However, it is not 
sufficient as a methodology for the quantitative ore body evaluation such as reserve estimation. 
Therefore, it is necessary to develop a methodology using numerical properties in future studies. 
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1. Introduction 

Approximately 8.9 million Canadians (30.3% of the national population) currently rely primarily 
on groundwater for domestic use, with approximately four million (11.5%) employing a private 
domestic groundwater well as their primary water source (Charrois, 2010; Statistics Canada, 
2016). Private well use is variable across the Canadian provinces and territories, with particularly 
high reliance in the province of Ontario. Recent estimates indicate as many as 511,000 functional 
domestic wells are located within the province (from Current Study) supplying an estimated 1.6 
million Ontarians (Office of the Auditor General of Ontario, 2014). While both the Ontario Safe 
Drinking Water Act (2002) and the Ontario Clean Water Act (2006) set out guidelines for public 
drinking water systems, private drinking water systems, including wells, are not analogously 
regulated. While private well construction is regulated under the Ontario Water Resources Act 
(1990), the responsibility for source condition and ongoing maintenance fall on the well owner. 
Moreover, as is the case in the Republic of Ireland, neither well testing nor treatment are mandated 
within existing legislative tools; accordingly, individual well owners are entrusted to manage their 
own water quality and testing. 

The mechanisms driving the occurrence and transport (and consequent contamination) of both 
groundwater aquifers (the water-bearing geological formation from which a well acquires its 
reserves) and water wells by indicator organisms and enteric pathogens are inherently complex. 
Local climate, hydrological and geological setting and the geographic/topographic location 
including nearby sources of contamination can all impact the quality of groundwater, with 
significant spatiotemporal variation associated with all these variables. While natural subsurface 
attenuation via gradual infiltration and recharge through subsoil and overburden layers affords a 
(varying) level of protection via microbial degradation, predation, filtration, and increased 
residence time (Hynds et al., 2012; Maran et al., 2016), contamination can and does still occur via 
preferential flow, direct ingress (runoff), and flooding (Hynds et al., 2012), due to poorly located 
and/or maintained sources (Maran et al., 2016). For example, the hydrogeological setting in large 
tracts of south-eastern Ontario has long been recognized as being particularly vulnerable due to a 
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preponderance of fractured and/or exposed bedrock (much like that in large tracts of western 
Ireland), providing myriad preferential contaminant pathways to the subsurface (Worthington & 
Smart, 2017). Furthermore, aquifer vulnerability and source susceptibility can and are further 
complicated via social components including stakeholder (e.g. well owner/users, local residents, 
the farming community) knowledge, attitudes and practices (Hynds et al., 2013; Krolik et al., 
2013; Atherholt et al., 2016; Worthington & Smart, 2017). Understanding and accounting for the 
significant levels of variation associated with individual groundwater contamination drivers in the 
first instance, and the concurrent influence of likely and potential driver permutations (e.g. high 
antecedent rainfall combined with medium thickness high permeability subsoils) groundwater 
quality is essential to the prevention of groundwater contamination by enteric pathogens and 
subsequent outbreaks of waterborne infection within the population. While previous studies have 
sought to geo-statistically elucidate these mechanisms, both in Ontario and further afield with high 
levels of variation reported (Embry & Runkle, 2006; Hynds et al., 2012; Krolik et al., 2013; 
O’Dwyer et al., 2014), a majority of these have been characterised by (necessary) spatial and/or 
temporal data limitations, and thus cannot be used to conclusively predict groundwater 
contamination over significant regional or seasonal scales. Moreover, high levels of variation have 
been reported regarding subsurface FIO occurrence, likely due to source selection, study area 
selection, and/or publication bias. The current study set out to address these issues by integrating 
two province-wide datasets collated over a 6-year period using a spatial fuzzy logic approach. Due 
to the extent and spatial distribution of the developed dataset, the authors consider that study 
findings and recommendations will be directly transferable to myriad diverse climatic, socio¬ 
economic, and hydrological regions, and thus of global significance. 

2. Methods 

Public Health Ontario (PHO) is a provincial Crown corporation that provides free well water 
microbial testing of private wells in Ontario, Canada, for both Escherichia coli and total coliforms 
(TC). Groundwater samples (200 mL) are submitted and tested for both indicator organisms (IOs) 
within 48 hours of collection via membrane filtration and culture on differential coliform (DC) 
agar (Public Health Ontario, 2018). Test results are subsequently reported to the local public health 
authority and well owner. Sample-associated data, including source address and IO presence, 
concentration and (where more than one test is undertaken) frequency are concurrently recorded 
and added to a Well Testing Information System (WTIS). 

As previously outlined, provincial regulations ( Ontario Water Resources Act, Regulation 903 
(1990)) govern private well construction in Ontario (MOECC, 2007), with the Ontario Ministry of 
the Environment and Climate Chance (MOECC) maintaining and managing all well construction 
records in the Water Well Information System (WWIS) database. Licensed well contractors are 
obliged to report and document all new well construction data, with property owners are mandated 
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to report any source alterations, repairs, or decommissions to the MOECC. The WWIS dataset 
comprises detailed geological and source-related variables pertaining to all properties utilising a 
private groundwater source in Ontario for future geotechnical and hydrogeological site 
investigations. These include UTM coordinates, watertable depth, overburden profile depths and 
classification, bedrock depth and classification, date and type of well construction, well casing 
diameter, screen depth and length, well use type, water pump test results (static water level/water 
level after pumping/pump test rate in GPM/pump test duration), and aquifer transmissivity. 

While the WTIS and WWIS datasets comprise highly complementary geo-referenced sample 
frames, due to their size, and inherent differences with respect to data management (agencies), 
data usage, and geo-referencing codes, to date, they have not been geographically integrated. A 
multi-step coding process (Figure 1), culminating in a fuzzy-logic join, has been developed to 
ensure that the datasets could be accurately geo-linked and successfully merged into what is likely 
the largest private groundwater source dataset of it kind. 

The publicly available quarterly-updated WWIS dataset comprises 43 regional, comma separated 
value (CSV) files, which were extracted from the compressed WWIS 2016Q4 20170405 file, and 
subsequently merged using RStudio (ver. 1.1.383) and R (ver. 3.3.3). The Universal Transverse 
Mercator coordinate (UTM) variable was separated into Zone, Northing and Easting, with 
geospatial conversion of the separated UTM coordinates for each of the individual zones (15, 16, 
17, and 18) accomplished using the SpatialPoints() and spTransform() functions from the rgdal 
package (i.e. spatial files read into R) with the NAD83 (North American 1983) datum. Resulting 
zonal geocodes (latitude and longitude) were transformed into dataframes, bound to the WWIS 
dataframe, and output as the final WWIS CSV file with 720,098 records of which 511,000 wells 
were designated as domestic plus other use. 

All well testing (WTIS) records between January 1 st , 2010 and December 31 st , 2016, inclusive, 
were exported to MS-Excel and subsequently exported to CSV format before merging using 
R/RStudio. Geo-statistical amalgamation was undertaken using the GDAL (Geospatial Data 
Extraction Library) and PROJ.4 (UNIX filter function for conversion of geographic longitude and 
latitude coordinates into Cartesian coordinates) libraries and a 5-step procedure within the R 
statistical environment. The sample frame geographic identifier (i.e. street addresses of private 
groundwater sources in text format) was available in text format, and thus required 
cleansing/scrubbing prior to successful geo-linking. A multi-step process using the tidyverse and 
stringr packages was employed to detect, correct, replace and (if necessary) remove inaccurate 
and/or incomplete records. Briefly, the source street number, street name, street type, and street 
direction were united into a single ‘clean_street’ variable. ‘Src.City, Src.Town.Munici’ 
(proprietary internal variable) were united as the source ‘city’ variable; ‘postal code’ and 
‘province’ were derived from ‘Src.Postal.Code’; and ‘Src.Province’, respectively. Fused variables 
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were further cleaned by removing NAs, commas and underscores using the gsub() function with 
Regular Expressions {Regex). The cleaned address components; street, city, province and postal 
code, were bound to the original WTIS dataframe and subsequently joined to create a final 
‘clean_address’ variable. The 2010-2016 WTIS merged dataset contained 1,252,922 legible 
records. WTIS records with duplicate barcodes were removed from the 2010-2016 WTIS dataset 
prior to geocoding. The newly created WTIS ‘clean_address’ variable was geocoded with the 
Google Maps Geocoding API using the geocode_url() function from the placement package. 
Returned geocoded results were bound to the WTIS dataframe, with unsuccessful results removed 
via fdtering (Status = “OK”). Results were output as the final WTIS CSV file, with a geocoding 
success rate of >75% achieved using the developed ‘clean_address’ variable. In all, 940,024 
records comprised the final 2010-2016 WTIS dataset. 

Dataframes were created using source latitude and longitude in concurrence with the “Barcode” 
and “WELL” variables from each of the final WTIS and WWIS datasets respectively for the fuzzy 
logic join. The computational power required to perform the fuzzy logic join using the geo_join() 
function from the fuzzyjoin package, required a BASH script to submit task command(s) via 
SLURM to the Frontenac Cluster at the Centre for Advanced Computing at Queen's University, 
Kingston, Ontario. R script computations were performed in parallel using 10 cores, 128 GB of 
RAM, and R (ver. 3.3.3) using R packages: plyr, tidyverse, doParallel, foreach and fuzzyjoin. The 
fuzzy logic join was used to integrate datasets based upon predetermined variable similarities, as 
opposed to absolute equivalence. Several variables may be concurrently employed for geo-linkage 
using the fuzzy logic approach (e.g. string distance, regular expression matching, etc.). The 
WWIS-WTIS merge was undertaken via geojoin() using the Haversine distance calculation 
method (determination of the “great-circle distance” between two points on a sphere) to identify 
matching records set at a maximum distance of 1 km. WWIS records with an identified well 
location calculated at < 1 km from associated WTIS well locations (based on geocode) were 
considered a match. Subsequently, records with the shortest matching distance were selected as a 
matched record, with results output as the final WWIS WTIS CSV file for future analyses. 

3. Results 

In all, approximately 939,000 records (individual well water analyses) over a 6-year period from 
approximately 159,500 individual groundwater sources (1 source/6.7km 2 ) were successfully geo- 
joined using the outlined fuzzy logic approach, with statistics for the distance variable of the 
spatially joined records as follows; minimum distance 0.6 m, 1 st quartile 38.2 m, median 75.3 m, 
mean 126.4 m, 3 rd quartile 157.8 m, and a maximum (permitted) distance of 1000 m. A mean 
sample number of 5.86 samples/well (approximately 1 sample/annum) was found, while 
approximately one-third (n = 46,234) of sources were analysed once over the study period, and 
thus referred to as a unique sample. The maximum number of tests undertaken at one site was 469 
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(i.e. 6.5/month). As such, the developed dataset not only offers a high level of spatial 
representivity, but will also permit high-resolution temporal (time-series) analyses. Overall, just 
over 4% of individual samples were E. coli positive, with a further 26.3% of samples Total 
Coliform positive (in the absence of E. coli). As shown (Figure 2), while a significant proportion 
of contaminated wells exhibited relatively low and temporally limited levels of E. coli (Mean: 
12.04 CFU/lOOml), almost 2,500 samples were associated with a measured E. coli concentration 
>80 CFU/lOOml, thus constituting a high likelihood of faecal ingress and a subsequent public 
health threat. As expected, significantly elevated E. coli detection rates were associated with 
samples taken during summer and fall (Figure 3), primarily due to climactic, hydrological and 
agricultural cycling. 

In successfully combining the WTIS and WWIS datasets, the present study links private well IO 
presence to infrastructural and hydrogeological records, thus enabling a systematic and evidence 
based analysis of groundwater contamination in private wells in Ontario. To date, approximately 
159,500 individual groundwater sources and 940,000 groundwater microbiology ( E . coli. Total 
Coliforms) results across the province of Ontario have been geo-joined with associated well 
construction and hydrogeological parameters. Moreover, efforts are currently underway to further 
join sample frames pertaining to overarching climate, antecedent local weather patterns, landuse, 
surface hydrology and agriculture. Due to the spatiotemporal extent and distribution of the 
developed dataset, it is considered that study results are likely transferable to myriad diverse 
climatic, geological, and hydrological regions. To the author’s knowledge, the developed merged 
dataset represents the largest of its kind globally, and as such, will permit elucidation of the effects 
of (hydro)geological and climatic setting, well design, and sampling regime on groundwater IO 
detection rates to date. Perhaps more importantly, the work carried out to date (and initial 
unpublished findings) highlights the wealth of data available for environmental and geoscientific 
research, and the potency of data science and geo-statisticians for increasing data usability and 
efficacy, thus adding significant value. 
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Fig. 1 . Flow diagram depicting transformation of the independent WWIS and WTIS raw datasets 
to the spatially joined WWIS-WTIS dataset. 
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Fig. 2. Distribution of E. coli positive groundwater samples 
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Introduction 

Visual inspection of deterioration phenomena across the fagade of a historical monument 
demonstrates the spatial variability of weathering processes. This variation in the material’s 
response to weathering stress is response to the material’s sensitivity to change. Previous 
weathering simulation experiments and exposure trials have been used to investigate the 
connection between weathering processes and spatial variability of response. However, the 
material’s stability is a function of both spatial and temporal components. Temporal sensitivity is 
a function of both the magnitude and frequency of formative events and the spatial sensitivity of 
the material (Brunsden 2001). 

Understanding of the links between system sensitivity and temporally dependant environmental 
processes remains vague across all scales of study. In the case of stone weathering studies, this is 
partly due to the demands of the sampling strategies required to capture the variation and the 
timescale of a block’s transition from emplacement to ‘failure’ (Coleman 1981). Due to these 
limitations, the application of spatiotemporal geostatistical techniques is now being examined as a 
potential tool for the study of these processes. 

One area of urban stone decay literature that requires further attention is the initial response of 
quarry ‘fresh’ material to emplacement within a building. Previous small-scale investigations have 
demonstrated that alteration commences within a few months to a year of emplacement 
(Turkington et al. 2003; Maravelaki-Kalaitzaki 2002; Viles 1990). These early changes to the 
material will have a lasting influence upon the development of future weathering processes and 
their spatial distribution. Warke and Smith (2007) suggested that this initial period represented an 
adjustment phase, as stone adjusts to ‘new’ conditions of exposure. Several previous studies have 
suggested that environmental conditions, particularly those resulting from the aspect of exposure, 
will have a significant influence upon initial alteration (Adamson et al 2013). It is therefore 
essential that this factor is explored as part of any study. 

The necessity to investigate both the spatial and temporal components of this transition, well suits 
the capabilities of spatiotemporal kriging tools. This paper will explore the application of 
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spatiotemporal techniques to investigating the deterioration of building stone following 
emplacement. It shall also explore the influence that microclimatic conditions, related to aspect 
will have upon initial weathering response. 

Methodology 

Five sandstone blocks were placed within an exposure frame, located in South Belfast, for a 
duration of one year. One block was exposed to each of the four cardinal points whilst the fifth 
was positioned to represent a horizontal surface on a structure, such as a window sill or balustrade 
(Fig. 1). Permeability measurements were recorded in a regular grid across the exposed surface of 
the block once a month, creating a data set that is both spatially and temporally dense. No previous 
studies have collected a similar quantity of points, across both space and time, to investigate 
weathering processes. 



Fig. 1. Block emplacement and exposed faces for weathering exposure trial setup 

Analysis was carried out using the Gstat package in R. Monthly variograms were generated and 
modelled for each of the blocks, facilitating the investigation of structures that exist within the 
spatial variability of the permeability values. The models were then used to produce kriged 
surfaces that could be used to identify areas of change. 


However, the kriged surfaces are snapshots of the stone’s properties at a point in time. Examination 
of difference surfaces demonstrated that the spatial variability of the material properties varied 
over time. It was therefore concluded that application of the spacetime package to facilitate 
spatiotemporal kriging in Gstat would be beneficial (Pebesma 2012). The data collected from the 
weathering exposure trial creates regular full grid spacetime frames (STF). It must be noted that 
this ‘grid’ represents the layout in spacetime, between sample location and time. 
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The data from the weathering exposure trial will be bound into Spatialpointsdataframe object 
(STFDF), a three-dimensional class using time as an index. This class will be used to generate a 
spatiotemporal variogram. Three families of models, Metric, Separable and ProductSum were then 
tested to identify which would provide the best fit through comparison of the mean square error 
(MSE). 

Results 
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Fig. 2. Variograms for the permeability data from block Exl for each time lag (month). The 
temporal variability of the data set can be seen in the grouping of the lags, A represents the 
block following emplacement, B contains most of the monthly observations except for the final 
month, C. 

Plotting the spatial temporal variogram shows that many of the temporal lags have a similar form, 
however temporal variability increases monthly (Fig. 2). Temporal variance increases from the 
emplacement condition at time To (A) to a new state (B). During the final month of the study, 
variance increases again suggestive of transition to another state (C). 

The wireframes (Fig. 3) show that a similar trend in temporal variance is experienced by blocks 
Ex2 (South) and Ex4 (North). Alternatively, Block Ex3 (West) experiences highly changeable 
variance over the study period. Comparison of mean square error selected the product sum model 
(140.12) over separate (149.14) and metric (141.45) models (Fig. 4). 
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Fig. 3. Wireframe models of sample variograms for the vertical blocks (Exl - Ex4) 

With the product sum model, was is possible to interpolate the surface. Using a modelspace that 
matches with the observation temporal spacing, once a month, surfaces like those produced 
through kriging can be generated. However, when applied to a new model space, with temporal 
spacing that differs from the sampling scheme, then it is possible to interpret surfaces for points 
between the observation events (Fig. 5). 
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Metric Separable Product sum 



Fig. 4. Exl best fit spatiotemporal variograms of Metric, Separable and Product Sum families 
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Fig. 5. Output surfaces for block Exl (East), interpolating permeability (mD) in both time and 
space. 
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Discussion and Conclusions 

The application of only spatial techniques proved to be inappropriate to fully interpret the 
complexity of the changing material properties. Use of spatiotemporal kriging allowed the 
modelling of the sample blocks to illustrate the changing material properties over time. The 
observed variance of the temporal component has improved our understanding of the nature of 
early episodic change within the stone decay system. 

Most of the blocks have responded to emplacement within the urban environment with an increase 
in permeability variance, as expected from an initial adjustment period. After this, variance appears 
to be relatively consistent until the final month, when another transitionary period seems to begin. 
This could be the result of early biological colonisation affecting the surface permeability of the 
blocks. Additionally, comparison of the blocks revealed that the west facing block (Ex3) has 
responded differently than the other blocks. A possible explanation is that this aspect faces towards 
the prevailing wind in Belfast and experiences a significant number of sunlight hours. Both factors 
will affect the wetting and drying of this material and therefore the deterioration pathway. 
Considering these findings, the spatiotemporal approach has provided improved insight into the 
experience of stone post emplacement within the urban environment and is assisting the 
development of conceptual models. 
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Introduction 

The soil profile is a sequence of soil horizons. For classification purposes at any level of taxonomy 
soil surveyors usually first establish genetic horizons in the soil, and then make a conclusion about 
the taxonomic position of the soil. Laboratory analysis either confirms or disproves the field 
diagnostics, but soil division into horizons is the first step in soil mapping. 

The properties of a particular soil horizon vary significantly in space, and commonly do not depend 
directly on easily observed external factors. The objective of this study was to study the spatial 
variation of different horizons’ properties in natural forest soils. 

The study was made at the territory of the Republic of Karelia, Northwest Russia. The study plot 
Veshkelitsa was situated in the subzone of middle taiga. The study plot was established in hilly 
glacial and glaciofluvial landscapes. The plot was characterized by complex topography with 
abundant depressions occupied by lakes and peatlands. The dominant tree species were birch, 
aspen, pine and spruce. The soils of the plot included Albic Podzol, Endogleyic Albic Podzol, 
Entic Podzol, Dystric Gleyic Cambisol, Gleyic Histic Cambisol, Dystric Histic Gleysol, Umbric 
Gleysol, Dystric Ombric Fibric Histosol, and Dystric Ombric Hemic Histosol. 

We studied the spatial variability of various soil horizons, using the data of soil surveys of a plot 
with an areal km2. Sampling lag was 100 m. In the course of studies, the depth of O, A, E, B and 
T horizons was measured and physical properties of soils were determined in each sampling point. 
Soil monoliths were cut using a bore with a drill bit of 100 cm3. The collected drill samples were 
weighed to determine their density. Water content was determined by gravimetry. The water 
content in peats was calculated using the same method as for other horizons. Under laboratory 
conditions, the following parameters were determined: pH by potentiometry in a 1 M KC1 extract 
and the content of soil organic matter by C:N analyzer. 

Spatial variability of soil properties was determined using the geostatistical method for estimating 
the relationship between the variance of the properties and the sampling interval. The resultant 
data were employed to plot variograms - curves showing the relationship between semivariance 
□ (h) and shift values h. 

The statistical analysis showed that all the horizons strongly varied in depth and their chemical 
and physical properties. For mineral soils the highest level of variation was recorded for litter layer. 
The properties of litters are mainly affected by local factors manifested in all biogeocenoses, 
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including the heterogeneity of plant cover. The determining factor was the topographical position 
of sampling points (the lower or middle part of the slope) and the type of vegetation (deciduous 
vs. coniferous forest). However, there was not any statistically significant dependence between 
variables of interest and location of sampling point. 

Genreally spatial non-uniformity of mineral soil horizons depended on their depth. The topsoil 
horizons had the highest level of variability, and the parent material - the lowest one. In addition, 
the highest level of heterogeneity was recorded for the moisture content and the soil organic matter 
content of horizons. 

In organic soils the highest level of heterogeneity was recorded for the density and moisture 
content of horizons. High values of field water content (2700% of dry soil) are noted in the TO 
horizon. Such values are typical for oligotrophic peats, the water capacity of which can reach 
3000%. In the TE horizon this value is 1500%. The water content in the upper horizons of peat 
soils varies in a wide range exceeding 1000%, which reflects the nonuniform accumulation and 
evaporation of atmospheric water in litter and peat horizons. A normal distribution is confirmed 
for water content in most horizons. The maximum water contents are noted in points with local 
overmoistening confined to relief microdepressions. The coefficients of variation determined for 
the water content in the upper horizons of peat soils are higher than those found for mineral soils 
in 1.5-2 times. 

The change in density and water content down the soil profile is related to the increase in the 
degree of peat decomposition. The variability in the properties of the organic horizons depended 
on the botanical composition and degree of humification of the organic material. Factor analysis 
showed that there were statistically significant differences in all soil properties between Fibric and 
Hemic Histosols. 

The spatial distribution of organic carbon content in the upper horizons is characterized by 
variation coefficients of 4% in peat soils to 40% in mineral soils. The variation of carbon content 
is related to changes in the composition of plant residues under different biogeocenotic conditions 
in litter and to heterogeneous peat accumulation. 

For all horizons the smallest level of variation was recorded for soil acidity. The studied soil has a 
very strong and strong acid reaction. The highest acidity is noted in the oligotrophic peat soil. 
Geostatistical methods were used for spatial interpolation of data. For Histosols the application of 
geostatistical methods was not very effective. The Histosols were distributed discontinuously, as 
small separate paths in depressions or around lakes. 

For all properties of mineral horizons, high sill/nugget ratio indicates that most variation takes 
place at characteristic distances less than 100 m. Evident anisotropy was revealed for all the 
properties of mineral horizons. Presumably, this anisotropy might be associated with the relief. 
The direction of the greatest variation of the properties coincides with the direction across the 
glacial ridges: periodicity is clearly expressed in this direction. 

The revealed spatial variation features of forest soil properties can be used for the optimization of 
sampling procedures and be a useful tool for correcting the data of traditional soil mapping. Further 
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research will be needed to elucidate the origin of the high randomness and variability in soil 
horizons properties for natural forest soils and establish quantitative statistical relations with 
landscape variables 

The research was supported by the grant of the Russian Science Foundation, project No. 17-17- 
01293. 
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Abstract 

The Shark Bay Scallop Fishery is Western Australia's largest scallop fishery. The catch depends 
primarily on the strength of recruitment from the breeding season of the previous year and this 
fishery saw a severe decline in scallop abundance in 2011 after an extreme marine heatwave. In 
this paper indicator variables are used to discretise the a nn ual scallop distribution and to identify 
annual high abundance thresholds for the fishery via indicator variograms and cross variograms. 
Indicator cokriging of the indicators representing annual hotspots is used to model the spatial 
distributions of high scallop abundance across years. This study will consider whether spatial 
patterns prior to the extreme event persist in the years of partial and full recovery and may be 
useful in assisting with improved catch predictions and total allowable catch (TAC) setting 
processes that have been applied since the fishery re-opened. 

Introduction 

The Shark Bay scallop fishery is based on a single species, Ylistrium balloti, and is the most 
valuable scallop fishery (AU$10-57 million) in Western Australia (Kangas et al., 2011). This 
species is short-lived (2-3 years), has fast growth and highly variable recruitment which is 
primarily environmentally driven. Some contributing environmental factors are the strength of the 
Leeuwin Current and water temperature (Lenanton et al. 2009). In a recent report (Caputi et al 
2015a,b) scallops were identified as one of the species as having high sensitivity to climate change. 
The characteristics of the Shark Bay environment that make the region conducive for saucer- 
scallop settlement and survival are the shallow sandy embayment and fairly stable oceanic 
conditions in the deeper waters of the Bay (Logan et al. 1974). 

The stock is assessed by means of a fishery-independent survey conducted between October and 
December (historically when the fishery was closed). The survey samples the scallop population 
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by the two age groups: recruits (0+, scallops of size up to 82 mm) and residuals (1+ or older, size 
above and equal to 83 mm). The sampling region covers the two fishing grounds. Shark Bay North 
and Denham Sound (Figure 1) which are separated by seagrass beds. 



X (nmil) 

Fig 10. Shark Bay scallop fishery survey sites, coordinates are relative to a local coordinate system 


Annual electronic records are available for each year since 1997, and the years 1985, 1987, 1992 
and 1993. The annual survey provides size and abundance information from 119 survey trawl sites. 
The data are used to determine an index of recruitment strength during that year. They also provide 
an index of the size of the residual stock. The prediction of the catch for the following year is based 
on recruitment strength and abundance of residual stock. In earlier papers the spatial characteristics 
of the scallop population from 1997 to 2010 were described via indices based on surveys over time 
(relative abundance of recruits, latitude and longitude of the centre of gravity, inertia, anisotropy, 
and global index of collocation). The results confirmed the high variability of the scallop 
distribution over time with changes in the location of areas of high abundance from year to year 
(Mueller et al, 2012a, 2012b). 

An extreme weather event in the summer 2010/11 (Caputi et al, 2016) resulted in a marked 
reduction in abundance of the fishery resulting in closure of the fishery in 2012 for three and a half 
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years. It was considered to be sufficiently recovered between 2015 and 2017 to permit low levels 
of fishing and in 2018 has fully recovered. In this paper indicator variables to discretise the annual 
scallop distribution and to identify annual high abundance thresholds for the fishery via indicator 
variograms and cross variograms. Indicator cokriging of the indicators representing annual 
hotspots is used to model the spatial distributions of high scallop abundance across years. (Petitgas 
et al, 2016). This study will consider whether spatial patterns prior to the extreme event persist in 
the years of partial and full recovery. 

The definition of thresholds for hotspots 

There are several definitions for hotspot cutoff available in the literature. The first is the method 
introduced by Bartolino et al (2011) which is based on the cumulative frequency of the variable of 
interest and defines the hotspot cutoff to be the greatest value z for which the slope of the 
cumulative frequency distribution is 1. Here the spatial distribution is ignored, but the value z 
defined as the hotspot cutoff can be used to define an indicator random function for mapping the 
spatial distribution of the probability of exceeding the cutoff. 


I 


-.‘“Hi 


An alternative is based on geostatistical principles and was first proposed by Petitgas(1993). This 
method has been further developed by Petigas et al (2016) and is briefly described below. The 
spatial distribution of the variable of interest is assumed to be represented by a random function 
Z(u ) where u denotes a location within the study region consisting of the two fishing grounds 
described earlier. Associated with Z is a sequence of thresholds z u i = 1, such that z x < 
z 2 ... z n _i < z n and indicator random functions I z . such that 

[1, Z(u)>Zi 

Z(u) < z t 

which define nested sets A t of points with the property that Aj c A t for j > i. The determination 
of geostatistical hotspots makes use of two properties of indicator variograms and cross 
variograms: The indicator cross variogram Yij(h) can be interpreted as the probability of entering 
Aj from outside A t when the separation distance is assumed to be h and the ratio Yij(.h)/Yi(h ) as 
the transition probability of being in Aj when entering A { at distance h. The set chosen to represent 
a hotspot is the set A t where / is the minimum index for which Yi(i+i)(h)/Yi(h) is flat. As the 
random function Z is only available at sample locations, spatial interpolation can be applied to the 
indicator random function I z . can be used to determine a map of the hotspot(s) by depicting the 
probability of exceeding the threshold z t . In the case when a time series of survey data is available 
cutoffs are defined for every year and the family of hotspot indicators is modelled jointly via 
cokriging based on a linear model of coregionalisation. 


Data 
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The data on which this study is based comprise records for 25 fishery independent surveys 
conducted between 1985 and 2017 in Shark Bay. For each of the years and for both residuals and 
recruits the deciles of the distribution were used to define indicator thresholds to determine the 
annual hotspot cutoff. This cutoff typically corresponds to the 80 th percentile of the distribution. 
It should be noted that the cutoffs are highly variable ranging from 4 scallops per nautical mile in 
2013 to 1081 in 2007 for recruits and from a low of 11 in 2014 to a high of 1293 in 1992 for 
residual scallops. For the overall scallop abundance the range is 7 to 1483 (Table 1). 


Table 5 Summary statistics and hotspot thresholds for recruit, residual and total scallops (catch per nautical mile) 




Recruit Scallop; 

s 


Residual Scallop 

s 

Total Scallops 

Year 

Count 

Min 

Mean 

Max 

Cutoff 

Min 

Mean 

Max 

Cutoff 

Min 

Mean 

Max 

Cutoff 

1985 

62 

0 

131 

960 

206 

0 

39 

268 

88 

0 

170 

963 

272 

1987 

68 

0 

83 

548 

147 

1 

557 

3112 

920 

3 

639 

3171 

1067 

1992 

69 

3 

472 

5769 

528 

10 

866 

3719 

1293 

23 

1338 

6260 

1977 

1993 

68 

0 

202 

1636 

395 

22 

235 

760 

388 

32 

437 

1787 

632 

1997 

61 

0 

105 

610 

129 

0 

17 

111 

31 

0 

122 

721 

162 

1998 

62 

0 

44 

1693 

26 

0 

90 

1334 

101 

0 

134 

1756 

161 

1999 

62 

0 

252 

2864 

250 

0 

22 

180 

51 

0 

274 

2940 

273 

2000 

48 

0 

139 

852 

227 

0 

55 

593 

66 

0 

195 

1213 

288 

2001 

43 

3 

120 

450 

195 

1 

28 

141 

40 

5 

149 

553 

248 

2002 

65 

0 

164 

2863 

189 

0 

35 

345 

63 

2 

199 

2966 

241 

2003 

72 

0 

258 

2359 

488 

0 

90 

1233 

130 

5 

348 

2524 

535 

2004 

75 

0 

103 

1510 

114 

0 

136 

607 

194 

0 

239 

1877 

327 

2005 

74 

0 

85 

508 

163 

3 

138 

844 

218 

9 

223 

900 

343 

2006 

78 

0 

496 

4032 

879 

0 

212 

3530 

115 

6 

708 

4077 

1239 

2007 

58 

0 

643 

5823 

1081 

0 

278 

1181 

521 

3 

921 

6394 

1483 

2008 

69 

1 

308 

1485 

544 

0 

263 

1228 

410 

4 

571 

1726 

1056 

2009 

75 

0 

161 

1535 

236 

0 

95 

587 

160 

0 

256 

1782 

464 

2010 

83 

0 

194 

1976 

262 

0 

24 

235 

39 

0 

218 

1992 

285 

2011 

82 

0 

31 

224 

52 

0 

0 

6 


0 

32 

224 

52 

2012 

73 

0 

20 

343 

20 

0 

0 

1 


0 

20 

343 

21 

2013 

74 

0 

6 

80 

4 

0 

0 

9 


0 

7 

80 

7 

2014 

77 

0 

140 

914 

216 

0 

9 

99 

11 

0 

149 

982 

221 

2015 

82 

0 

184 

1707 

319 

0 

54 

454 

70 

0 

238 

1724 

408 

2016 

81 

0 

410 

5134 

429 

0 

122 

1178 

139 

0 

533 

5798 

720 

2017 

78 

0 

306 

2519 

384 

0 

297 

2130 

440 

0 

603 

2736 

946 


The statistics in Table 1 show the high variability of the scallop abundance with distributions that 
are generally strongly positively skewed (coefficients of skewness are not shown here). 


Although there are two fishing grounds that do not interact, the data were modelled jointly for this 
study. The main reason for this is the low overall sample size, which would result in an inability 
to infer viable variograms for the Denham Sound where the number of samples ranged from 9 to 
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40 throughout the time period under consideration. There is also no biological reason for treating 
the fishing grounds as separate. For the computation of variograms pairs of samples coming from 
different fishing grounds were discarded and for estimation the seagrass beds were treated as hard 
boundaries so that estimation was only based on samples within a fishing ground. The hotspot 
indicators were modelled with an isotropic linear model of coregionalisation (LMC) consisting of 
a nugget and two spherical structures. For recruits of ranges of the spherical structures were 6.6 
nmil and 30 nmil, for residual scallops 5.4 nmil and 30 nmil and for the overall scallop catch per 
nautical mile 6.3 nmil and 30 nmil. Indicator cokriging was applied followed by a correction of 
order relation violations. 

Results 

Spatial maps of the locations of potential hotspots are shown in Figs 2 to 4.The locations differ 
markedly among years. Hotspot locations for the total scallop catch are located in Shark Bay North 
from 1993 to 1999. The years from 2000 to 2010 are characterised by both fishing grounds 
showing hotspot locations in contrast to the earlier year generally having regions of high 
abundance in the Shark Bay North and the years from 2012 onward only having hotspots in 
Denham Sound. For recruits, patterns are similar, but there is a re-emergence of minor hotspots 
in Shark Bay North from 2014 onward. For residuals it is only in 2017 that a hotspot is present in 
this fishing ground. For Denham Sound hotspots are present from 2014 onward, suggesting that 
the recovery of the stock in the more southern Denham Sound had an earlier on-set than in the 
more northern fishing ground. 

A chart of the size of the scallop hot spots across time is shown in Fig 5. This chart again highlights 
the variability of the sizes of the hotspots as well as the predominant location of hotspots in Shark 
Bay North up to the extreme weather event, with the change to Denham Sound as the ground with 
greater abundance form then on. 
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Fig 11. Probability of occurrence of a scallop abundance hotspot (turquoise=0, red =1, the colour scale is in increments 
of 0.1), coordinates are in nautical miles relative to a local coordinate system 
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Fig 12. Probability of occurrence of a recruit hotspot (turquoise=0, red =1, the colour scale is in increments of 0.1), 
coordinates are in nautical miles relative to a local coordinate system 
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Fig 13. Probability of occurrence of a residual scallop hotspot (turquoise=0, red =1, the colour scale is in increments 
of 0.1), coordinates are in nautical miles relative to a local coordinate system 


101 



























Extended Abstracts of GeoENV2018 - Belfast, Northern Ireland 4-5 July 2018 

Scallop hotspot size (nmil 2 ) 
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■ Denham Sound □ Shark Bay North 
Fig 14. Size of scallop hotspot across years by fishing ground 

Discussion and Conclusion 

The results of this investigation of hotspots for the Shark Bay Scallop fishery have highlighted the 
spatial variability in locations of high scallop abundance over time. Areas of hotspots are also 
highly variable in size. The years from 2000 to 2010 are characterised by both fishing grounds 
showing hotspot locations in contrast to the earlier year generally having regions of high 
abundance in the Shark Bay North and the years from 2012 onward only having hotspots in 
Denham Sound. Consideration of the hotspot maps for recruits indicates a re-emergence of recruit 
hotspots in the northern fishing ground from 2015. Given that recruit scallops generally make up 
the majority of the total scallop catch, the hotspot maps should be read in conjunction with maps 
of the local proportion of recruit scallops and abundance maps. 

The extreme weather event led to the closure of the fishery for a period of three years from 2012 
to 2014. The fishery re-opened in part in 2015 and has been fully open for fishing since 2016. The 
reopening was accompanied with a change in the management of the fishery which used to operate 
on a limited fishing season during February and March each year until 2011. Instead the fishery is 
now open for the entire year, but operates under a quota system to ensure adequate spawning spot 
for subsequent years. The geostatistical definition of hotspots which explicitly accounts for spatial 
relationships in the data might provide an effective means for the delineation of spatial closures, 
for example regions of low abundance could be closed to fishing to allow stock recovery. The 
spatial maps of recruit and residual scallop hot spots further allow closures to allow adequate time 
for spawning by residual stock or growth of recruit scallops to a commercially attractive size. 
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Given that recruit scallops generally make up the majority of the total scallop catch (Mueller et al 
2012a,b), the hotspot maps should be read in conjunction with maps of the local proportion of 
recruits to further enhance the decision making. 
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Abstract 

In order to fully grasp the knowledge and role of permeability for coal seam gas development, the 
understanding of characteristics of cleat main attributes such as aperture, spacing, orientation, and 
cleat intensity or the number of cleat per unit length along a coal seam sample is needed first. 
These attributes can be established in the subsurface layer of coal albeit not a simple task, for 
example: fracture or cleat intensity. An alternative approach has been conducted for years by the 
application of cleat aperture-size distributions to compare samples from coal seam outcrops and 
subsurface samples, as well as samples from different locations. This research used similar 
approach to estimate the comparison of cleat distributions based on Muaraenim Formation coal 
seams in two different locations that represented the condition of South Sumatra sub-basins; 
mainly in Musi Rawas Utara area representing Northern part of Palembang sub-basin, and in 
Tanjung Enim area representing Southern part of Palembang sub-basin. Cleat attributes (Aperture, 
Spacing, and Cleat Orientation) data was obtained through measurements along the scanline of 
coal seam outcrops of Muaraenim Formation at coal mine sites. The analysis is carried out by 
using statistical methods to help further establish cleat characteristics in Muaraenim coal seams. 
Cleat characteristics will be useful to help illustrate how face and butt cleats develop, and also its 
role in the development of coal seam gas extraction, such as Coal Bed Methane (CBM), in South 
Sumatra. 

Keywords'. Cleat, Geostatistic, Coal, South Sumatra 

Introduction 

Coal is one of main commodities of energy resources in Indonesia, especially in Sumatra region. 
South Sumatra basin hosted the largest and economically significant coal deposits of the island, 
with rank mostly consisted of low rank coal, such as Lignite to Sub-bituminous. As of 2014, it 
was estimated that 124.8 billion tons of coal resources spread all across Indonesia, with South 
Sumatra contributed to 50.23 billion tons of those resources (Tobing, et al., 2014). Beside coal 
deposits, some conventional oil and gas exploration drillings at much greater depth on this basin 
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have also indicated existence of Coal Seam Gas (CSG), otherwise known as Coalbed Methane 
(CBM) within coal situated above these oil and gas deposits. It is believed that South Sumatra 
tertiary coal deposits contain most of CGS reserves of Indonesia. (Sosrowidjojo & Saghafi, 2009). 
Numerous attempts of explorations have been conducted over the years to evaluate the properties 
and feasibility of CSG production in South Sumatra (Tobing, et al., 2014) (Sosrowidjojo & 
Saghafi, 2009). But, with lack of proper knowledge and methods to extract them, those efforts only 
produced limited results. Still, the prospects of CSG as alternative future energy resources of 
Indonesia seems very likely. 

Coal is a heterogeneous earth material in which fractures mainly formed naturally. This 
phenomenon allowed fluid and gas to flow and or transported from within coal seams through 
microporous coal matrix or systematic joint sets of fractures, which do not extend into adjacent 
clastic facies, known as cleats. Cleat, a term first used in 19 th Century, provides well-connected 
pathways for fluid to be transported within coal which made them have significant influence 
towards permeability of coal seams during dewatering process (Weniger, et al., 2016; Brook, et 
al., 2016). 

Cleat often classified into two kinds, based of its orthogonal fractures to bedding: Face Cleat which 
typically dominant, parallel to maximum horizontal compressive stress direction and formed 
during early stage of coalification; while Butt Cleat formed during relaxation of initial stress field 
and orthogonal or nearly orthogonal to face cleat (Rodrigues, et al., 2014; Brook, et al., 2016). The 
exact origin of cleat still caused considerable debate, but regardless it is widely accepted that 
geometry of cleat is influenced by tectonic stress field (Widera, 2014). Cleats can be further 
classified by attributes and characteristics, such as size, spacing, aperture, or connectivity. They 
can be related to coal rank and type (Weniger, et al., 2016). Permeability in coal seams is affected 
by such attributes in a way or another, for example cleat aperture help increasing coal permeability 
while cleat spacing decreasing coal permeability. Permeability is also higher along face cleat than 
butt cleat (Paul & Chatterjee, 2011). 

Cleat characteristics and attributes became important subjects as coal acts as both a source and 
reservoir rocks for natural gas, in this case CSG. Many researches have stressed out the importance 
of these attributes to control gas and fluid flow within coal seam. Attributes such as spacing 
showed an increasing trend along increasing coal bed thickness as well as increasing cleat spacing 
in dull coal lythotypes and low rank coal (lignite - bituminous) than in bright coal lythotypes and 
high coal rank (Anthracite). Cleat spacing was also increasing linearly along cleat height. 
(Weniger, et al., 2016; Brook, et al., 2016). 

Cleat aperture-size, especially micro opening-mode fractures, are quite hard to be determined 
precisely without proper tools. On surface level, it can be limited to less than 0.1 mm; while in 
subsurface some cleats were filled with diagenetic minerals with aperture-size preserved locally 
to as much as 0.5 cm. Cleat apertures are often linear to cleat heights. Coal with large cleat 
apertures tends to have large height (Laubach, et al., 1998). Number of fractures per unit scan line 
length, or fracture intensity, varies between surface and subsurface samples. Intense fracturing can 
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help gas production due to faster diffusion in coal matrix, but larger apertures can also increase 
water production partly due to high permeability coefficients (Weniger, et al., 2016). 

In this paper, the main objective is to evaluate and quantify cleat attributes and characteristics such 
as orientation, spacing, aperture, and cleat type on surface level. Natural network of cleats can be 
found within coal seams of Muaraenim Formation of South Sumatra basin. Muaraenim formation 
was a part of Tertiary sequence formed during Late Miocene to Early Pliocene (Pulunggono, 
1986). It consists of claystones, shales, and sandstones with coal beds, which deposited in shallow 
marine-brackish to non-marine environments. Muaraenim Formation is a product of regressive 
cycle of deposition that marked the upper stratigraphy of the basin (De Coster, 1974; Sosrowidjojo 
& Saghafi, 2009). 

The observation was focused on coal seam outcrops of Muaraenim Formation at two separate 
locations representing parts of South Sumatera basin, especially Middle Palembang group, which 
Muaraenim formation is a part of (Sosrowidjojo & Saghafi, 2009). The first one is in Tanjungenim 
Coalfield of PT. Bukit Asam, Tbk. at the Southern part of South Sumatra basin; and at Musi Rawas 
Utara where coal mining is undertaken by PT. Tri Aryani in Rajawali opencast coal mine and 
represents Northern part of South Sumatra basin. There is a trend of decreasing number of coal 
beds and thickness from south area of South Sumatra basin to north of South Sumatra basin (De 
Coster, 1974). 

Methods and Data 

Most of cleat data was obtained through field observations on coal seam outcrops of Muaraenim 
Formation with the age of Middle Miocene to Early Pliocene. The Stratigraphy of Muaraenim 
formation mostly composed of sandstones and mudstones with layers of coal seams formed in 
deltaic environments (Pulunggono, 1986; Barber, et al., 2005). Data was gathered from two 
different locations with same range of coal ranks: lignite to sub-bituminous or bituminous. These 
two locations was chosen partly because they have active coal mining sites that unveiled fine 
outcrops of coal seams and visible features of cleats on the roof and or wall of coal beds. The 
geological features of each locations was also put into consideration as each sites is associated to 
nearby well-known geological structures. Tanjungenim coalfield is often associated with 
Muaraenim anticlinorium, while Musi Rawas Utara can be associated to North Palembang 
anticlinorium (Fig 15). Although, during field sampling the association of these structures, mainly 
in Rajawali opencast mine in Musi Rawas Utara, were not really clear as much as we have hoped 
beside opening-mode fractures, which is the main topic of this paper. 
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Fig 15. Stratigraphy and Surface geological map of South Sumatra basin with estimated areas of 
cleat observation, modified from Pulunggono (1986). 

ID Scanline method was used to collect cleat data in each locations. Scan line section transect coal 
seam along 8 m length with intervals of 1 m each. The measurements were later confined to lxl 
m area within each intervals to help simplify field observation of cleats. Cleat attributes was 
measured along linear traverses and includes basic attributes such as: orientation (strike and dip), 
spacing, and apertures. Some height data were also measured in Tanjungenim coalfield, while in 
Rajawali mine no data measured due to poor execution during field observation. Scan lines in mine 
outcrops were identified, measured, and photographed with some included in this paper. 

Cleat attributes measured from Rajawali opencast mine have been collected from two scan lines, 
each with different position in coalbeds: at the roof of coalbed and at coalbed walls. For first scan 
line, 105 numbers of face and butt cleat data has been recorded. For second scan line, 87 numbers 
of cleat orientations and 101 numbers of face and butt cleats attributes have been recorded (Fig. 
16 ). 
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Fig. 16. Typical Cleat patterns of Rajawali Coalfield, South Sumatra 

While in Tanjungenim coalfield, scan lines distributed from three different outcrops in sites of 
seams: Suban, ALP, and Banko. It should be noted that number of seams commonly deposited in 
Muaraenim Formation can reached 12 seams or beds with some found in Tanjungenim coalfield. 
Only one main seam found in Rajawali mine. Seam A1 of Tanjungenim coalfield, in particular, 
has similar characteristics to coal seam of Rajawali mine, such as: association to layers of clayband 
and overlain by thick sequence of sandstones and mudstones which were used for comparison. 
Cleat data collected from these locations varies from 100 to 270 numbers recorded (Fig. 17 ). 



Fig. 17. Coal outcrops of ALP and Banko, Tanjungenim coalfield with cleat patterns 

Prior to this, many researches have focused on core & borehole samples instead of collecting data 
at surface level such as coal outcrops. Only a handful of researches used outcrops data to help 
quantitatively characterize cleat networks. Borehole samples are used in order to obtain more 
accurate and reliable results, but this method is rather expensive and time-consuming especially 
for peoples from outside of corporate and industrial practices. Following sampling methods 
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applied by Ortega, et al. (2006) and Brook, et al. (2016) among others, scanline data set needs to 
be sorted and distributed so characteristics of coal cleat of Muaraenim formation can be determined 
and helps to understand permeability of coal seams. 

To achieve such results, statistical methods and approach was employed in order to compare cleat 
distributions within coal seams of Muaraenim formation. R programming language and Rstudio 
program are used to quantify and present data set and model. Following steps for cumulative- 
frequency fracture-size distribution laid out by Ortega, et al. (2006) became one of basis used for 
the analysis. The process includes: (1) list and sort all measurements of cleat attributes, (2) simplify 
and normalize the data, (3) obtain parameters of model and plot data to provide graphic display of 
cleats distribution. 


Results and Discussion 


Cleat spacing distribution among these locations are visualized through histogram and normal 
density curves. Histograms (Fig. 18) showed that most cleat spacing data obtained from field 
observation are positively skewed or skewed right. These distributions tend to heaped on the left 
side of numerical values with longer tail on the right. They are asymmetrical and has large number 
of occurrences in lower value cells, which indicates smaller numbers of cleat spacing along the 
scanline and a possibility of higher permeability in coal seam. 


TRA1 (Face Cleat) 


TRA2 (Bun Cleat) 


ALP (Unknown) 



1 2 3 4 5 6 7 



1 2 3 4 5 6 7 8 



0 12 3 4 


Spacing (an) 


Spacing (an) 


Spacing (on) 


Suban (Unknown) 


Banko (Unknown) 



Fig. 18. Histograms of Cleat Spacing density based of raw observation data in Rajawali Coal Mine 
and Tanjungenim Coalfield 

Cleat spacing distributions in Rajawali coalfield are properly skewed right with large occurrences 
ranges from 2 cm to 3 cm. Cleat spacing distributions in Tanjungenim coalfield are more extreme 
with positive skew and heap bunched against bottom end of scale which is classified as reverse J 
distribution on Suban and two groups of skewed heaps that showed a negative skew and can be 
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classified into U-shape (bimodal) or double-peaked on ALP cleat data. Cleat spacing distribution 
on Banko is also skewed right with large occurrences ranges from 1 cm to 2 cm. 

Cleat aperture distributions are also visualized through histograms and normal density curves with 
most distribution skewed right and heap stacked in lower value cells (Fig. 19). They are also 
asymmetrical. Distributions of cleat apertures in Rajawali coal mine shows similarities between 
its distributions and cleat spacing distributions of the coal mine. Size of apertures within Rajawali 
coal mine are more inclined towards small scale with face cleat spacing distributed wider than butt 
cleat. 


TRA1 (Face Cleat) 


TRA2 (Butt Cleat) 


ALP (Unknown) 



Aperture (cm) 


Aperture (cm) 


Aperture (cm) 


Suban (Unknown) 


Banko (Unknown) 



0 02 0 04 0 06 0 08 0 10 012 014 



0 02 0 04 0 06 0 08 0.10 0 12 0.14 


Aperture (cm) 


Aperture (cm) 


Fig. 19. Histograms of Cleat Aperture density based of raw observation data in Rajawali Coal 
Mine and Tanjungenim Coalfield 

Meanwhile, cleat aperture distributions in Tanjungenim coalfield showed occurrences of missing 
numbers in ALP and Banko scales with data tends to stack into multiple clusters. Aperture 
distribution on Suban also showed numbers of data stacked into extreme heap on the left side of 
scale with large occurrence between 0.06 to 0.07 cm scales. It is perhaps due to apertures 
encountered along scan lines of Tanjungenim coalfield are smaller in size and nearly identical than 
in Rajawali coal mine. Vagueness of cleat apertures grouping between face and butt cleat in 
Tanjungenim coalfield also contributed to these occurences. It also indicated that cleat aperture 
within Tanjungenim coalfield doesn’t provide large fluid and gas flow than its counterparts in 
Rajawali coal mine. 

Summary of cleat data obtained through ID Scanline methods are presented in Table 6. As showed 
earlier through histograms and normal curves, cleat spacing and apertures measured in Rajawali 
coal mine and Tanjungenim coalfield resulted in different and varied distributions of each sites. 
Summary of statistical descriptions such as Minimum value. Maximum value. Medians, and 
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Standard deviations was produced to give further insights into each data and correlate to previous 
figures. 

Spacing data measured from these two locations showed varied measurements with the smallest 
cleat spacing recorded is 0.1 cm and maximum spacing between 3.9 cm to 7.5 cm in Tanjungenim 
Coalfield. Maximum spacing of face and butt cleats recorded in Rajawali coal mine are 6.72 cm 
and 7.89 cm with smallest are 0.7 cm and 1.12 cm. Standard deviation of face cleats is 1.09 and 
butt cleats is 1.04. The difference between face cleat and butt cleat spacing means that samples of 
butt cleats are poorly distributed on the coal bed of than face cleats. Table 6 also shows limitations 
of visible cleat attributes measured along the scan line, providing the basis to classify these 
fractures into macrocleats. 

While maximum apertures of both face and butt cleat at Rajawali coal mine are 3.8 to 3.9 mm in 
size, smallest aperture measured of 0.3 mm, and median of 1.3 to 1.5 mm, with standard deviation 
of 0.06 were also measured. Aperture-size of face and butt cleats are similar with face cleat 
aperture-size slightly bigger than butt cleat. Maximum aperture of Suban, ALP, and Banko of 
Tanjungenim Coalfield are 1.5 mm, 1.5 mm, and 1.4 mm respectively. Smallest aperture are 0.1 
to 0.2 mm. Median are 0.7 mm, 0.2 mm, and 0.6 mm, while standard deviation are 0.2, 0.3, and 
0.3. 

As previously stated, common cleat aperture-size ranged from less than 0.1 mm on surface level 
to 0.5 cm in subsurface depending on certain conditions and tools used to determine those sizes. 
In Rajawali coal mine and Tanjungenim coalfield, only visible cleats were measured with aperture- 
sizes larger than 0.1 mm. Some external factors such as diagenetic minerals filing and active coal 
mining have also contributed to large aperture-size found in those sites. They also further cemented 
implications of bigger cleat height and pathways for fluid flow within body coal seam, with bigger 
aperture-size measured at Rajawali than at Tanjungenim in general. 

Table 6. Summary values of cleat spacing and aperture measurements of Rajawali coal mine 
(TRA) and Tanjungenim Coalfield (ALP, Suban, and Banko) 


Site 

Names 

Type 


Spacing (cm) 



Aperture (cm) 


Min 

Max 

Median 

Standard 

Deviation 

Min 

Max 

Median 

Standard 

Deviation 

TRA1 

Face 

0.70 

6.72 

2.91 

1.09 

0.03 

0.39 

0.15 

0.06 

TRA2 

Butt 

1.12 

7.89 

2.93 

1.04 

0.03 

0.38 

0.13 

0.06 

SUBAN 

Unknown 

0.10 

7.50 

0.90 

0.86 

0.01 

0.15 

0.07 

0.02 

ALP 

Unknown 

0.10 

3.90 

2.80 

1.24 

0.01 

0.15 

0.02 

0.03 

BANKO 

Unknown 

0.10 

5.70 

1.50 

1.09 

0.02 

0.14 

0.06 

0.03 
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Fracture intensity of coal seams between Rajawali and Tanjungenim coalfield can also be 
determined from the distribution of aperture-size. Variations and similarities of data distribution 
including minimum aperture size is used in a scale independent fracture intensity estimates. This 
estimation can be used to compare different volume of rocks with a common minimum size of 
cleat aperture. ALP and Suban share a same minimum aperture with 0.1 mm, while Face and Butt 
cleat of Rajawali mine also share same minimum size of fracture of 0.3 mm. To correlate fracture 
intensity of Rajawali mine to Tanjungenim coalfield, aperture distribution and comoon minimum 
fracture size is choosen, in this case the sampling numbers can be limited to 0.3 mm. 

Overall, cleat spacing and apertures examined from outcrops of these two locations provide 
preliminary information to cleat characteristics of Muaraenim formation. It can be assumed from 
these models that cleats formed in Tanjungenim and Rajawali have similarities and differences, 
with apertures and spacing in Rajawali coal mine are larger than in Tanjungenim coalfield. Many 
factors can be contributed to such results, including observation methods and numbers of data 
obtained. Detailed observations of cleat in outcrops and more varied data are needed to established 
wider assumptions towards cleat characteristics of South Sumatera Basin region. 

Conclusions 

ID Scanline methods and statistical analysis were used to further established influences of 
opening-mode fractures (cleats) towards permeability in Muaraenim coal formation. Cleat 
aperture, spacing, height, and orientation data can be gathered from outcrops of coal seams with 
limitations only to measurements size of each fractures formed on roof or wall of coal bed. Scan 
lines from Rajawali coal mine and Tanjungenim coalfield were correlated to distribution of cleats. 
Cumulative cleat spacing from each locations varies between 0.1 cm and 1.12 cm to 3.9 cm and 
7.89 cm. Cleat aperture analysis also showed more similar measurements between each locations 
from minimum of 0.1 mm and 0.3 mm to maximum of 1.5 cm and 3.9 cm. Observation of coal 
seams implies sizeable cleat pathways for fluid flow and poor distribution of spacing in butt cleats 
than in face cleats. 

It is important to gather more data and correlate cleats observation on various locations in South 
Sumatra basin to help determine characterization of cleat attributes that allows to permeability 
estimation within coal seams and indirectly establish parameters of cleat development process and 
prospects of coal seam gas in South Sumatra basin. 
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Abstract 

Geochemical data is widely used to identify precious metals in mineral exploration or detect 
contamination in environmental assessments. A geochemical soil survey as part of the North 
American Soil Geochemical Landscapes Project (NASGLP) was conducted at sites (n=560) across 
all provinces in Canada with sampling of the 0-5 cm, A, B, and C horizons. Geochemical analysis 
by ICP-MS provided elemental data. For the purposes of this study, the major elements from the 
<2 mm size fraction of the C horizon were selected for examination. The objectives of this study 
were to evaluate the Canadian geochemical data of the NASGLP through exploratory data analysis 
and to determine if geochemical data can infer qualitative soil mineralogy at sites across Canada. 
An evaluation of the geochemical data included univariate and multivariate statistical methods. 
Results indicated that log-ratio principal component analysis of the geochemical data consist of 
three distinct groups of Ca, Mg; Fe, Ti, Mn; and Al, K, Si while Na and P have smaller variability 
independent of the groups. The groupings were interpreted to represent soil minerals based on 
regional bedrock geology across Canada. 

Introduction 

The North American Soil Geochemical Landscapes Project (NASGLP) was a tri-national survey 
undertaken by the United States, Canada, and Mexico. The project aimed to collect soil samples 
under consistent field sampling and chemical analysis protocols to have a continent wide database 
of soil geochemical and mineralogy data. Canada participated in field seasons from 2007 to 2009 
but did not finish sampling as priorities changed for the Geological Survey of Canada. 

The objectives of this study were to evaluate the geochemical data from the Canadian sites of the 
NASGLP through exploratory data analysis, to examine patterns and spatial variability in the data, 
and to determine if geochemical data can infer qualitative soil mineralogy at sites across Canada. 

Methods 

Site selection under NASGLP was designed to provide consistent site density and resulted in 
13,487 sites across North America and specifically 6,018 sites in Canada. However, Canada only 
sampled 577 sites, approximately 10% of the intended sampling was completed (Friske et al., 
2013a). 
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In the field, a 60 cm wide by 70 cm long pit was excavated and soils were sampled from the 
following depths 0-5 cm. A, B, and C horizons. In the laboratory, soil samples were air dried 
from each horizon and split into <2 mm and <0.63 pm size fractions. A 4-acid (HC1, HNO 3 , HCIO 4 
and HF in a mixed ratio of 2:2:1:1) ‘near-total’ digestion was used on the two size fractions before 
analysis of 58 major and trace elements by ICP-MS and ICP-AS. Organic matter was estimated as 
Loss on Ignition (LOI) at 500°C (Friske et al., 2013b). 

This study concentrated on geochemical data from the C horizon for the <2 mm size fraction as 
the C horizon consists of the most unaltered soil, has the least amount of organic matter, and 
closely resembles the bedrock geology. Moreover, it was assumed that the C horizon would 
provide a reliable interpretation of the soil mineralogy. The <2 mm size fraction has better element 
detection when analyzing soil samples to allow for accurate geochemical concentrations (Klassen, 
2009). 

The current study focused on the major elements (Al, Ca, Fe, K, Mg, Mn, Na, P, Si, Ti) as the 
majority of soil minerals are composed of these elements. The element concentrations were 
converted into oxides and LOI was included in the dataset. The number of sites across all Canadian 
provinces with available data was 560 (Figure 1). 



Fig. 1. Location of Canadian sites with C horizon, <2 mm size fraction, data (n=560) from the 
North American Soil Geochemical Landscapes Project. 

Exploratory data analysis was used to reveal trends in the data to indicate soil mineralogy at 
individual sites. As the oxide data is compositional in nature, which sums to a constant (e.g. 1, 
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100), the data needed to be transformed by a centred log-ratio to avoid spurious behaviour when 
applying statistical methods. 

Data analysis was conducted through univariate and multivariate methods. Quantitative 
measurements of untransformed oxides were statistically summarized. Distribution of 
untransformed and transformed data were visually assessed by graphical measures (e.g. boxplots, 
histograms, and scatterplot matrices). 

A prominent exploratory data technique is principal component analysis (PCA) where the 
dimensions of a large, transformed dataset are reduced in order to describe the variation in the 
data. The number of principal components were determined by linear combinations from a 
correlation matrix. The principal components were shown on a biplot with the variables (in this 
case, the oxides) and sample sites to reveal information about the soil mineralogy (Grunsky, 2010). 
The principal component scores for each site were mapped to evaluate the spatial patterns of the 
oxides and if correlates to regional bedrock geology. 

Results and discussion 

The 560 survey sites covered a broad area across Canada, which encompassed six different 
ecozones. The proportion of sites in the Atlantic Maritime ecozone was 38%, Boreal Plains 4%, 
Boreal Shield 21 %, Mixedwood Plains 18%, Montane Cordillera 2%, and Prairies 18%. In Canada 
the bedrock geology is diverse and entails stratigraphic rock formations of igneous, metamorphic, 
and sedimentary from a multitude of different time periods. The majority of bedrock geology at 
the sampling sites was sedimentary (clastic and carbonate rocks) at 79%, igneous geology at 16%, 
and metamorphic geology at 5%. 

The oxide that contributed the most to the soil composition was Si02 and the least was MnO (Table 
1). The distribution of the data was evaluated by the coefficient of variation (CV) ratio of the 
standard deviation to the mean; the highest CV oxide was observed for CaO which indicated a 
large dispersion in the data. An observed difference between the median and mean values indicated 
skewed data as shown by CaO, LOI, and MgO (Table 1). 

Strong positive correlations of the log-centred major oxides were observed between Si-Al, Si-K, 
Ti-Al, Ti-Fe, Al-Fe, Al-K, and Fe-Mn oxides (Figure 2). Strong negative associations occurred 
between Al-Mg, Al-Ca, Fe-Ca, and Ca-K oxides. One unique relationship was MgO-CaO as the 
plot showed a curved pattern that could be attributed to the different clusters in the data. These 
clusters could be due to the regional bedrock geology influences of Western Canada’s sedimentary 
rocks versus Eastern Canada’s diversity of igneous, metamorphic, and sedimentary rocks. 
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Fig. 2. Scatterplot matrix of log-centred C-horizon, <2 mm size fraction, major oxide data (n=560) 
from the North American Soil Geochemical Landscapes Project. 
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Table 1. Statistical summary of untransformed C-horizon, <2 mm size fraction, major oxide data 


from the' 

North American Soil 

[ Geochemical Landscapes Pro 

ject. 


Min 

25% 

Median 

Mean 

75% 

Max 

CV 

Si0 2 

29.09 

65.75 

70.84 

70.06 

75.20 

90.43 

0.11 

Ti0 2 

0.06 

0.42 

0.57 

0.58 

0.72 

2.60 

0.43 

AI 2 O 3 

2.96 

9.78 

11.55 

11.52 

13.26 

19.02 

0.23 

Fe203 

0.61 

2.97 

4.22 

4.34 

5.57 

10.87 

0.41 

MnO 

0.01 

0.05 

0.07 

0.08 

0.10 

0.74 

0.68 

MgO 

0.12 

1.06 

1.58 

2.19 

2.56 

19.83 

0.92 

CaO 

0.03 

0.35 

1.39 

3.70 

4.39 

29.41 

1.42 

K 2 0 

0.67 

1.81 

2.14 

2.27 

2.74 

4.64 

0.29 

Na 2 0 

0.18 

1.10 

1.49 

1.66 

2.06 

4.15 

0.49 

P 2 O 5 

0.02 

0.08 

0.12 

0.12 

0.14 

1.15 

0.59 

LOI 

0.03 

1.93 

2.86 

3.48 

4.26 

18.10 

0.76 


The first three principal components were used for interpretation of soil mineralogy. The three 
components account for 70.34% of the variation in the dataset with the proportion of the first 
principal component at 39.28%, the second 21.26%, and the third 9.81%. Principal component 1 
(PCI) had AI 2 O 3 eigenvector at 0.89 and almost completely opposite was CaO at -0.90. Principal 
component 2 (PC2) had the LOI eigenvector at 0.66 with Na20 at -0.70. Principal component 3 
(PC3) eigenvectors have P 2 O 5 at 0.68 and -0.40 from LOI. 

The selected biplot of PCI and PC2 showed patterns in the data with a group of oxides (Fe 203 , 
MnO, Ti02) in the positive x- and y- axes (Figure 3). These oxides suggests soil minerals that have 
potentially undergone weathering from igneous and metamorphic rocks (e.g. iron oxides). A 
second configuration in the positive x-axis and the negative y-axis displayed Si, Al, K, and Na 
oxides with large variability, which could indicate minerals of the feldspar family (e.g. plagioclase 
and K-feldspar). Opposite to most of the other oxides are CaO and MgO on the negative x-axis of 
PCI. This indicates the variability of the soil minerals with Ca and Mg due to the extensive range 
of bedrock geology, especially sedimentary rocks, across Canada that underlie the sampling sites. 
The phosphorous oxide showed the smallest amount of variability in PCI to suggest it had less 
influence on the soil mineralogy. Similarly, LOI is follows along the positive x-axis of PCI and 
does not directly impact the soil mineralogy. 
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Log-centred C horizon major oxides 



- 0.1 0.0 0.1 
PCI 

Fig. 3. Biplot of principal component 1 versus 2 of log-centred C horizon, <2 mm size fraction, 
major oxide data (n=560) from the North American Soil Geochemical Landscapes Project. 

Principal component 1 scores were spatially mapped for the individual sampling site locations 
(Figure 4); the smallest components (green) corresponded to the biplot of PCI which showed CaO 
and MgO on the negative axis. This correlates to areas of sedimentary bedrock geology with 
minerals that are higher in Ca and Mg (e.g. calcite, dolomite). Accordingly, regions in the 
provinces of Saskatchewan, Manitoba, and Ontario have sedimentary bedrock geology of this 
nature. In comparison, the higher components (pink, yellow) align more with areas of igneous and 
metamorphic bedrock geology. These regions include a large number of silicate minerals (e.g. 
feldspar, micas) across the Maritime Provinces (New Brunswick, Nova Scotia, Newfoundland, 
and Prince Edward Island) and the Precambrian Shield (northern region in Saskatchewan). 

The spatial variability of PC2 scores showed strong negative scores (green) in central to northern 
regions of Saskatchewan (Figure 5). The lowest PC2 score was Na20 which indicates silicate 
minerals rich in Na (e.g. albite). The strong positive PC2 score for LOI was represented in 
Manitoba, Newfoundland, and New Brunswick. Due to the loss of volatile material (e.g. water and 
carbon dioxide), this can correlate to carbonate minerals, especially evident in Manitoba. 
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Fig. 4. Map of principal component 1 scores at sampling sites for log-centred C-horizon, <2 mm 
size fraction, major oxide data (n=560) from the North American Soil Geochemical Landscapes 
Project. 



Fig. 5. Map of principal component 2 scores at sampling sites for log-centred C-horizon, <2 mm 
size fraction, major oxide data (n=560) from the North American Soil Geochemical Landscapes 
Project. 
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Conclusion 

Exploratory data analysis undertaken in this study revealed relationship of the major oxides. In 
particular, the association of CaO and MgO showed distinct clusters due to the variation of 
sedimentary geology between Eastern and Western Canada. Soil oxide data can be used to infer 
broad soil mineralogy from principal component analysis. The results showed soil minerals can be 
generally classified as carbonates, silicates, and weathered oxides. The maps of principal 
component scores showed the major oxides have, unsurprisingly, been influenced by regional 
bedrock geology at the sampling sites. Regions in Saskatchewan, Manitoba, and Ontario contain 
carbonate minerals while silicate minerals are dominate in the Maritime Provinces. 
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Abstract 

In modem times wireless connections via networks grow rapidly. Most of people use wireless 
networks like WiFi at work, at home and also when they are outside. Problem of efficiency of 
wireless network is currently very important especially when users want to have stable connection 
to network. Additional challenge is to create and maintain reliable WiFi network that is open, it 
means every user could use such network if only have a range. Example of such network is 
university network where users are very specific group. Author’s proposition is to use 
geostatistical methods to create models and predict behavior of WiFi open network on whole 
considered area where it reach its range. Therefore, in this paper efficiency of open WiFi network 
named PWR-WiFi will be considered. PWR-WiFi is located in building belonging to the main 
campus of Wroclaw University of Science and Technology (WUST) in Poland where there are 
two deaneries, few lecture halls, many laboratories, two libraries and also administrative and 
researcher’s offices. Thus most of users of considered wireless network are students and 
employees of WUST like lecturers or administrative workers. Data for analysis are from three 
consecutive years 2014-2016 to better show the dynamic growth of number of PWR-WiFi network 
users. Parameter, which most reflecting behaviour of PWR-WiFi network is the number of users, 
obtained from Access Points (APs), that was investigated during research. Preliminary and 
structural analysis with approximation of variogram models were made and as the next step spatial 
prediction models of wireless network efficiency were performed by Turning Bands geostatistical 
simulation method. Three models of spatial prediction were prepared for three consecutive years 
2014, 2015 and 2016. After that the results were compared with spatial prediction models created 
previously by ordinary kriging estimation method. 

Keywords. Spatial prediction. Turning Bands method, wireless network, geostatistical methods 

1 Introduction 

Prediction of wireless network efficiency was discused in many publications. For example in 
(Katsaros and Manolopoulos, 2009) was presented the issues of location and request prediction in 
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wireless networks characterizing them as a discrete sequence prediction problems, and surveyed 
the major Markovian prediction methods. 

Analysis of Wi-Fi performance data for a Wi-Fi throughput prediction approach was made in (Pan, 
2017). Author implemented a Wi-Fi parameter visualization tool to show users' Wi-Fi performance 
in a graphic way. In this tool, machine learning method is used for Wi-Fi performance analysis to 
predict Wi-Fi throughput. A SVM-based classification model is proposed to work as a prediction 
function which takes Wi-Fi parameters both for target AP and nearby interference APs as input, 
and output is categorized as Wi-Fi throughput: good, medium, poor or very poor. 

(Kulkami et al., 2011) propose a simple traffic prediction mechanism using the Recursive Least 
Squares (RLS) algorithm awhich does not make any stationarity assumptions on the underlying 
time series and hence is able to operate on the raw data. Results prepared on real data performance 
evaluation show that RLS algorithm is capable of delivering accurate predictions and shows good 
adaptive behaviour at the same time being intuitively simple and lightweight from an 
implementation perspective. 

In (Odabasi, 2016) Generalized Regression Neural Networks (GRNNs) approach was employed 
in order to predict the output, packets dropped of a sample DMesh network simulation. Authors 
observed that some of considered parameters such as: traffic density and number of channels used, 
have a direct impact on error rate of the regression model. As result the high variance proved that 
GRNN approach can represent real characteristics of DMesh architecture. 

(Prasad and Agrawal, 2010) proposed a generic framework to approach the problem of mobility 
prediction using Hidden Markov Models (HMM). Authors used a real dataset with information 
regarding APs, users and derived mobility information from it. The data mined from the traces was 
useful in predicting the users movement and may be used to allocate resources in the network. 

(Ananthi and Ranganathan, 2013) in their work presented a survey on mobility prediction schemes 
proposed for wireless networks such as: prediction used in routing protocol, mobility prediction 
based on mobile user’s behavior, Markov based prediction scheme, Mobility Prediction Algorithm 
Based on Dividing Sensitive Ranges, Autoregressive Hello protocol. Mobility prediction using 
Neural Network and Bayesian network. 

On basis of rewiev of literature it could be claimed that no one till now using geostatistical methods 
to prediction efficiency of wireless networks. Althought, Author will be present in this paper 
Turning Bands method (TBM) to spatial prediction of WiFi network. It could be next step in 
Authors reserach, beacuse previous using geostatistical estimation and simulation methods to 
spatial prediction but wired not wireless network (for example Borzemski L. and Kaminska- 
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Chuchmala, 2012a; Borzemski L. and Kaminska-Chuchmala, 2012b; Borzemski L. and 
Kaminska-Chuchmala, 2013; Kaminska-Chuchmala, 2014). 

2 Turning Bands Method 

One of the most popular method from geostatistical simulation is Turning Bands method. It was 
used for the first time by (Chentsov, 1957) in the special case of Brownian random function. Next, 
it was developed by (Matem, 1960) and used for simulation by (Matheron, 1973). TBM is a 
stereologic tool used for reduction of multidimensional simulation to a one-dimensional one. The 
main idea of Turning Bands is to adding up a large number of independent simulations defined on 
lines scanning the plane (Chiles and Delfiner, 2012). The TBM consists in the reduction of a 
Gaussian random function of covariance C to the simulation of an independent stochastic process 
of covariance Co. According to (Lantuejoul, 2002 ) let (9 n , n e N) be a sequence of directions S+ 
and let (X n , n e M) be a sequence of independent stochastic processes of covariance C e , then 
random function: 

Y Ul} (x) = - i r f j X k (<x,d k >), xe M 1 , (1) 

yin k =l 

assumes covariance equal to: 

c M (h) = -f j c dt (<h,6 k >)- (2) 

n k =1 

Turning Bands algorithm is given below: 

1. Transform input data using Gaussian anamorphosis. 

2 . Choose a set of directions 0„ such that —V" S 0 ~ zu . 

n k ~ x k 

3. Generate independent standard stochastic processes with covariance functions 

C C 

'-' 0 , ■ 

4. Compute — 7 =^”,, X k (< x,6 k >) for any x eD. 

yin 

5. Make kriged estimate y*(x) = £ c A c (x)y(c) for each x e D. 

6 . Simulate a Gaussian D and z random function with mean 0 and covariance C in domain D 
on condition points. Let (z(x), x e (c), c eC) be the obtained results. 

7. Make kriged estimate z*(x) = A c (x)z(c)for each x eD. 

8. Obtain the result (y*(x) + z(x) — z* (x), x G D). 

9. Perform a inverse Gaussian transformation to return to the original data. 

3 Experiment Background 
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The data considered in presented research were collected from open WiFi network named PWr- 
WiFi. This wireless network is located in the main campus at (WUST). PWR-WiFi network using 
the standard IEEE802.11 of wireless infrastructure. The data collected to this research are obtained 
from eleven AP’s given in five-storey building (fig. 1), named B4. APs work with using frequency 
2.4 GHz in IEEE 802.1 lb/g/n standards and 5 GHz in IEEE 802.1 la/n standards. APs contained 
in PWR-WiFi network are wireless connected to switch and configured to get IP address from 
network and connecting to WiFi controller by LWAPP (Light Weight Access Point Protocol) 
protocol. 



Fig. 1 (a) B4 building located in main campus of WUST (b) projection of localization APs in B4 

The analysed data were obtained from passive experiment (real data) which were taken from 14 th 
- 29 th April over three consecutive years: 2014, 2015 and 2016 collected every hour between 7:00 
AM and 9:00 PM. Examined APs are installed in B4 building as follows: one on first floor, one 
on second floor, two on third floor, five on fourth floor, and two on fifth floor. All analysis and 
prediction presented in this paper were performed in R language under R environment in version 
3.4.4 which is available as Free Software on GNU licence (R Core Team, 2018). Moreover, 
prediction with geostatistical TBM was made by using RGeostats package (Renard et al., 2018). 


4 Preliminary and Structural Analysis of PWR-WiFi Data 

Examined PWR-WiFi network is an open university wireless network where most of users are 
students and employees of WUST like lecturers or administrative workers. In considered B4 
building there are two deaneries, few lecture halls, many laboratories, two libraries and also 
administrative and researcher’s offices. Three databases from years 2014-2016 which are 
examined during research contain every hour measurement from 16 days between 7:00 AM and 
9:00 PM each day. This part of day was chosen because of B4 building is closed by night and main 
traffic in network is during office hours and lectures. Mainly, the first classes are started at 7:30 
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AM and the last one finished at 8:30 PM. Generally, during the day, classes are started quarter 
after an odd hour. This specific character of PWR-WiFi network could be seen in figure 2, where 
data of number of users are presented for whole examined period of 2015. This figure shows 
periodical behaviour of network users (especially students). It have to be mentioned that this 
considered part of month April is a time in semester where students coming regularly on classes 
thus regularly need to access to the wireless network. 



Fig. 2 Number of users served by 11 APs in B4 building between 14 th and 29 th April 2015 

Basics statistics are presented in table 1. Its cover three years 2014-2016. The maximum number 
of users is equal 90 in 2016 and the minimum equals 54 in 2014. It could be noticed growing trend 
of number of users. Mean value also confirms this trend, because every year mean number of users 
is higher by about 4 to 6 users. Variance of data is also growing up what showing variability of 
this process and significant data differentiation. Furthermore standard deviation is also the largest 
in 2016 and equals 14.69. 


Table 1. Basic statistics of number of users for all considered APs located in building within 
three years 


Parameters 

14-29.04.2014 

14-29.04.2015 

14-29.04.2016 

Min number of users 

0 

0 

0 

Max number of users 

54 

72 

90 

Mean number of users 

4 

10 

14 

Variance 

31.99 

158.94 

215.80 

Standard deviation 

5.66 

12.61 

14.69 


Histogram of number of users in 2014 served by 11 Access Points is presented in figure 3. The 
highest frequency of users is in between 0 to 10 and also in the other bin between 10 to 20. The 
histogram is skewed left. 
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Fig. 3 Histogram of number of users in PWR-WiFi network in B4 building in 2014 
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Fig. 4 Gaussian anamorphosis for number of users in PWR-WiFi network in 2015 



First step in structural analysis is performing of Gaussian anamorphosis. For all datasets it was 
made by using 100 Hermite polynomials. 
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Model of variogram of network users in 2015 



Fig. 5 Models of variograms calculated in four directions for number of PWR-WiFi wireless 
network users in 2015 

The second step in structural analysis in geostatistical approach is to calculating variograms and 
approximate them with appropriate theoretical model or models. For each of databases from three 
years 2014-2016 variograms were calculated in four different direction: 0,45, 90, 135 degrees. As 
example variogram of number of users in 2015 is presented in figure 5. Distance lags for each 
calculation direction equals 5. The number of lags for each calculation direction is 15. For all 
directions it could be seen the nugget effect. The range of variogram function for two directions 
was about 30 meters and for two others direction almost 60 meters. In the next step variogram 
function was approximated by theoretical functions: nugget effect, exponential, and spherical. 

5 Spatial Prediction Models of PWR-WiFi Network Efficiency Simulated by TBM 

Models have 3 dimensions (x and y geographical coordinates and z the altitude coordinates) and 
prediction cover the space where spreads the range of signal from APs belonging to PWR-WiFi 
wireless network in building B4. Three models of spatial prediction were prepared for three 
consecutive years 2014, 2015 and 2016 and contain: Gaussian anamorphosis models, theoretical 
models of variogram approximation and moving neighborhood. The moving neighbourhood 
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search is performed by angular sectors and the neighbourhood ellipsoid is anisotropic. The search 
ellipsoid has a three dimensions. 

Results of spatial predictions by Turning Bands simulation method are presented in table 2. 
Maximum number of users is the highest in 2016 and equals 90 as well as variance and standard 
deviation. Growing trend in prediction models for whole considered area (all building B4) is also 
visible. 

Table 2 Statistics of simulated by TBM number of users of PWR-WiFi network 


Parameters 

14-29.04.2014 

14-29.04.2015 

14-29.04.2016 

Min number of users 

0 

0 

0 

Max number of users 

54 

72 

90 

Mean number of users 

10 

12 

13 

Variance 

92.77 

139.01 

224.55 

Standard deviation 

9.63 

11.79 

14.98 


In figure 6 there are presented raster maps of mean number of PWR-WiFi users simulated by TBM 
in years 2014-2016. Localization of more concentrations of users such as students, lecturers or 
administrative workers are similar for all years. Probably it is related to the fact that it is nearby 
(depends of floor in the building): lecture hall, library or deanery. Unfortunately in 2015 and 2016 
3 APs were disabled thus in some areas on maps there are less users. 




Fig. 6 Scatter 3D plot of number of PWR-WiFi wireless network users in B4 building at WUST 
in (a) April of 2014 (b) April of 2015 (c) April of 2016 
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Author prepared also spatial predictions on these considered databases by using kriging estimation 
method (Kaminska-Chuchmala, 2018). Comparison of results from spatial prediction models made 
by different geostatistical methods is presented in figure 7. Obtained values confirmed 
characteristic feature of estimation method, it means smoothing. In these cases, in wireless network 
prediction of efficiency models, more realistic data are desirable and reflect the better character of 
the phenomenon being studied. 
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a) Number of PWR-WiFi users 

estimated by kriging 
in B4 buliding in April of 2014 
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Fig. 7 Comparison of raster maps of number of PWR-WiFi users in B4 building in April 2014 (a) 
estimated by kriging (b) simulated by Turning Bands 


7 Conclusions 

Spatial (3D) prediction models of PWR-WiFi wireless network efficiency within three years: 2014, 
2015, and 2016 with using Turning Bands geostatistical simulation method were presented. Before 
prediction, preliminary and structural analysis of data had been discussed. These kind of spatial 
predictions, especially obtained raster maps, could be very helpful for administrators of networks. 

In future, Author plan to perform space-time (4D) prediction of PWR-WiFi wireless network with 
using more parameters like channel utilization. Models of prediction will be performed with using 
not only geostatistical estimation (Kriging) methods and simulation methods (Turning Bands), but 
also with other geostatistical methods like Sequential Gaussian Simulation. 
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